Talk 37:00

Observability at Scale

Observability is about understanding your system – its performance, reasons for actions, inactions, and failures, and the ability to pre-emptively act on various system limitations before they become a problem. The main tools of observability are metrics and logs. To empower developers, rather than introduce overheads and endless useless data, both metrics and logs should be designed into the system, rather than be added on. This is particularly true for large-scale systems.

In this talk, I will share my experience and the rules of thumb for working with metrics and logs at scale. I will also cover the theory behind these concepts.