Chaos Engineering: System Resiliency in Practice

Chaos Engineering: System Resiliency in Practice

1st Edition
308
English
1492043869
9781492043867
28 Apr

As more companies move toward microservices and other distributed technologies, the complexity of these systems increases. You can't remove the complexity, but through Chaos Engineering you can discover vulnerabilities and prevent outages before they impact your customers. This practical guide shows engineers how to navigate complex systems while optimizing to meet business goals.

Two of the field's prominent figures, Casey Rosenthal and Nora Jones, pioneered the discipline while working together at Netflix. In this book, they expound on the what, how, and why of Chaos Engineering while facilitating a conversation from practitioners across industries. Many chapters are written by contributing authors to widen the perspective across verticals within (and beyond) the software industry.

  • Learn how Chaos Engineering enables your organization to navigate complexity
  • Explore a methodology to avoid failures within your application, network, and infrastructure
  • Move from theory to practice through real-world stories from industry experts at Google, Microsoft, Slack, and LinkedIn, among others
  • Establish a framework for thinking about complexity within software systems
  • Design a Chaos Engineering program around game days and move toward highly targeted, automated experiments
  • Learn how to design continuous collaborative chaos experiments

Reviews (95)

excellent and varied real world examples

This is a very accessible (and quotable!) overview of the principles and practices of chaos engineering. I also really enjoyed the case study chapters. The book contains a wide variety of case studies, and each one felt authentic and real. This book feels like it is meant to build community as much as educate technologists. I think that is super important at this point in the evolution of the chaos engineering and resilience engineering communities.

Fascinating concepts - even for non-engineers

I am not a software engineer or coder, but I found the concepts in this book really interesting. The examples used to illustrate the use of chaos engineering begin very simply and increase in complexity, allowing you to follow along even if you don't know the lingo used by coders. Now that I know more about chaos engineering, I think differently every time I read the news about a major outage at Netflix or Google, for example. The key concept that really grabbed me is that in complex computer systems, there is no way for a single human brain to understand how it all works. Therefore, there is no way to predict the outcome as an infinite number of variables come into play. The system itself must be made resilient enough to detect and adjust when something is about to crash.

Highly recommended book for all engineers - "Resilience is created by people. "

As a lead, of a team of engineers, responsible for mitigating the risks and ensuring the "high availability" of multple products, I found this book as a great resource in understandnig and leveraging Chaos principles in my profession. I highly recommend this welll written book to those in software design/development/quality engineering. who want to learn more about how Chaos engineering can help you to proactively extract the possible risks and unknowns .. by leveraging well explained methodologies, .. exploring and unearthing critical pathways that can prevent crashes and unpredictable behaviors your customers will (hopefully not) enounter. The book provides details and explains the theories, including experiments, while offering insight on the "how's" and "whys" .. driving the point that learning from "failures" and even inducing "faults" in the system are so critical.

An informative and "enjoyable" read!

Chaos Engineering deserves 5 stars for both readability and content. I am certainly a novice in the field but I was able to grasp the concepts laid out in the book as opposed to feeling out of my depth. And what is more, it is a very enjoyable read. I recommend this book to anyone working in, or interested in, software engineering as well as fans of the Chaos Community Broadcast (youtube). The industry certainly seems to be leaning heavily into chaos engineering and continuous verification at all levels, from small startups to enterprise business.

This book changes the way you think of systems

Chaos Engineering is not about breaking all the things or wreaking havoc in production. Chaos Engineering is a discipline that helps navigate the inherent complexity in our systems. This book is packed with insight from engineering leaders at Google, Slack, and LinkedIn in addition to the authors' experience at Netflix. The chapters Security Chaos Engineering, Continuous Verification, ROI of CE, and People in the Loop were all very helpful in addition to the more case study style chapters. If you are new chaos engineering or if you are taking your resilience and reliability engineering to the next level, you need to read this book. Disclaimer: I am not an impartial reviewer. I was a reviewer for the Security Chaos Engineering chapter, and as this book was getting written/published, I joined the chaos engineering startup (verica.io) which was founded by one of the authors of the book.

establishes Chaos Engineering as a paradigm sprouted from roots in Site Reliability Engineering/SRE

Although the title might suggest this is an anarchists approach to Site Reliability Engineering(SRE), this book expresses the dynamics of applying functional safety practices (like FMAE) to complex systems. It is a subset of SRE, and although the book at a couple places tries to differentiate itself completely from SRE, it extends the SRE paradigm in great detail in a way that can be applied to most any complex process --even medical and financial ones which traditionally one might consider too risky for such things. This book looks at the process (system) from all practical perspectives in order to give the reader a good sense of guidelines on how Chaos Engineering might be applied with their very different (and maybe unique) complex system. A must read. Just know that verify, validate and test have different definitions in different contexts, and the authors' definitions are just to differentiate specific processes for part of the book.

Great In Depth Exploration Of The Topic

When people first hear about chaos experimentation, usually through Netflix' Chaos Monkey, it blows their minds - seems like a super cool idea, very "Netflix dramatic," they are sure elite engineers, but that's probably not for me and my shop... But with a thorough understanding of the principles underlying it, you can perform safe and helpful resilience experiments anywhere. This book explains the theory, goes through a bunch of case studies of real companies from CapitalOne to LinkedIn doing chaos experimentation, and presents a maturity model and many practical tips on how to start using experiments to determine the safe boundaries of your systems. If you're serious about ensuring the resilience of your systems, this is a new development that you can't ignore, and this book does an extremely thorough job of explaining the concepts and practice behind chaos engineering.

Reality based engineering for modern internet distributed applications

It's ironic that so many organizations have lost sight of their business processes being built on top of a communications protocol (tcp/ip) which was designed for resilience and adaptation to expected failure. This has led to some embarrassing transitions from legacy client/server architectures. The complexity and potential paths for failure in even a relatively simple distributed application is beyond the traditional management models for QA, security, and disaster recovery modeling. Chaos Engineering is a realistic, practical, and thorough guide to reintroduce technology systems management to the original principles of the internet - that failure is intrinsic to distributed computing. So embrace, plan, and adapt. Chaos Engineering blasts past the shiny acronyms to prescribe reality based engineering. If you are in any way responsible for or part of the modern application process I recommend you read this book. (disclaimer: I was provided a complimentary copy while attending a recent AllDayDevOps conference. This review is based though on my 30+ years in engineering, analyzing, and securing internet based systems, from Arpanet era to cloud microservices)

An excellent resource for improving resiliency

Chaos Engineering is an excellent resource that provides a comprehensive view of this titular modern practice which has become so crucial for successful distributed systems. Throughout the book, Casey and Nora explain the history of Chaos Engineering, how to implement it in your own complex systems, and how to scale its experiment-driven approach to a large environment. But they go beyond that as well, with two particular sections that stood out to me. The first is the concrete examples they provide from major companies like Slack, LinkedIn, and Capital One. These really helped illustrate the principles with real-world references. The second was the commentary on business factors and the ROI of chaos engineering. This is critical, as getting the business to buy in to willfully breaking things can be a hurdle if you don't know how to effectively communicate the benefits. Overall this book addressed everything I was looking for and more, and left me eager to begin my own resiliency experiments.

Great for team leads of for providing solid information when proposing Chaos Engineering to an org

I come from a developer background, so my initial expectation when getting a hold of this book is that it would be more practical, focusing on the technology step-by step. Instead, it gives a broader view of the theory behind chaos engineering, the supporting evidence to validate its need, and organizational steps to implement it. To be honest, I believe that for a book, this second approach offers greater value. Focusing on tech specifics would get outdated quickly, and that info can be easily searched online. From my personal experience, the biggest challenge of selling internally and implementing a new approach like Chaos Engineering is how to address cultural change. Corporate inertia is pretty hard to fight, so this book will equip you with the arguments, with references of other companies and success cases, and provide an action plan on how to implement in an org. Highly recommended!

Related Books

Comments

Popular posts from this blog

Daily Taurus Horoscope May 10 (10/05)

Daily Gemini Horoscope August 20 (20/08)

Daily Aquarius Horoscope December 19 (19/12)