Geek read: Cloud Observability in Action by Michael Hausenblas
Intro
I am super interested in observability, and I share my o11y journey in blog posts and during conference talks. I was even once invited to Michael Hausenblas's “OpenTelemetry in Use” podcast, the author of the book that I am reviewing.
So here is a disclaimer: I know the book’s author, who even signed my copy, but there will be no preferential rates. I met Michael when we discussed the architecture we planned to use in our OTEL collectors setup. I am reading his Observability newsletter https://o11y.news/, and I just wanted to read his book. Here are my thoughts about it.
Structure
The book is 230 pages long and divided into 11 chapters. It contains plenty of visuals and code snippets, and the author's thoughts are clear. The journey through Observability concepts is well-thought, and there are no repetitions. I would say that it’s a light read.
What can I learn?
We are starting with basics like:
- What is observability, and why do we need it
- Common terms used in the observability domain
- Deep dive into signals (if you ask yourself now, “What signals?” you should read this book. We are talking here about logs, metrics, traces, etc.)
- types of signals with all their pros and cons
- costs related to instrumentation, transfer, storage, and performance with many fantastic insights and things to consider
- instrumentation best practices
- things to watch out for (logs context & structure, cardinality explosion, signals hoarding, compliance, and many others)
- their B2I ratio that I had never met before
B2I [business logic to instrumentation] ratio is a measure that let you quantify “instrumentation impact” and is measure by:
total lines of code after / total lines of code before instrumentation
Then we are going into more complex topics:
- How to setup agents for the collection of signals and overview of solutions landscape
- How OpenTelemetry collector works, why it is so essential for modern telemetry solutions & how to set it
- How you can instrument your code
- Where can we store our signals? The author provides an overview of cloud, open-source, and commercial backends for logs, metrics, and traces. The author also discusses how data is stored in different stores and why, which is great!
I really liked especially part about TSDBs (time series databases) where author describes how your precious metrics data is stored. In case
cardinality explosion
was a mystery to you, you will clearly understand why it is a thing after reading this book.
- How can we visualize our collected telemetry using different observability frontends? The author provides an overview of the many products available in the space and discusses which one to pick for your setup.
- How to setup alarms based on our collected telemetry data
Out of my own experience the biggest issue related to alarms is the art of defining them. Especially alarm thresholds that won’t cause alarm fatigue. It’s not a task — it’s a journey. You might not agree with me, but that’s the biggest challange in the observability space. I woould love to see some strong guidelines on the defining them in this book.
- What are the models of handling incidents
- Detailed deep dive on distributed tracing
- The concept of developer observability, which is related to the shift left in the o11y space, equips developers with tooling to have shorter feedback loops.
- Continuous profiling: why, how, tooling, and overview of the solutions in the space
- What are SLI, SLA & SLO: what do they mean, why do we need them, who is the owner, how to define them & why do you need an error budgeting
- How do different types of signals correlate with each other, and why is it crucial for successful observability setup?
Summary
To sum up, it’s a really good book that feels like riding with a great guide on an observability safari, where you can familiarize yourself with the whole landscape of products and concepts. It’s not super long, and there is plenty of great content, especially for beginners but also for more advanced users in the space.
I consider myself one of the “more advanced in the space,” I have filled gaps in my knowledge related to the basics and learned a couple of new things. I especially liked the chapter about SLO; it helped me realize what I could do better to solve my issues.
If I could add one chapter to the book, it would be about large-scale deployments in the cloud environment. A blog post about setting up multi-account OTEL collectors setup on AWS is coming soon to my blog.
I don’t like one thing about the book. Because of its plenty of code snippets and “implementation details,” I felt it might not age well. In a dynamic area like observability right now, this might be a problem. I asked the author about it, and he pointed out that this book was published as part of “In Action,” which makes sense.
Does this mean that you shouldn’t read it? Definitely not! It has amazing content on observability theory and is full of sound advice and insightful considerations about building your o11y strategy. There are also plenty of great questions asked that could be combined into some “o11y checklist” for builders.
Where can I get it?
If I didn’t encourage you enough, you can listen to the author talking about the book here:
You can buy the book here: