Home
>
DevOps News
>
Chaos Engineering on CI/CD Pipelines – InApps Technology 2022

March 31, 2022 by Phu Nguyen

Chaos Engineering on CI/CD Pipelines – InApps Technology 2022

Main Contents:

Chaos Engineering on CI/CD Pipelines – InApps Technology is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn Chaos Engineering on CI/CD Pipelines – InApps Technology in today’s post !

Principles of Chaos Engineering

In chaos engineering, you run planned and thoughtful experiments that generate new knowledge about a system’s performance, properties and behaviors in the event of a failure.

The following points summarize some principles you need to follow when running chaos experiments:

Define Metrics for the Steady State of Your System

To successfully run chaos experiments, you need to define the metrics that indicate your application’s behavior in normal conditions. A system’s steady states depend on its use case and purpose. Hence, a good understanding of the steady states will enable you to track, monitor and properly understand how your system works when it encounters a bug.

When you define your system’s steady states, business metrics are more functionally useful than purely technical metrics, because they provide more granular details about an application’s health. They’re also more suitable for measuring customer operations or experience. For example, Netflix uses “streams per second” to evaluate how often their users press the play button on a streaming device. Other examples of business metrics are the number of declined transactions per minute, searches per hour, number of failed logins per minute, and the number of logins during a peak period.

Minimizing the Blast Radius

When you run chaos experiments in production, you will likely experience unexpected system outages and negative customer impact. Because system failures are inevitable, you’ll need to ensure that the negative impacts of chaos experiments are contained and minimized.

Continuous Chaos

Continuous chaos experiments allow you to automatically identify system failures and enable you to spend more time implementing new services and features. Doing one-off experiments is a great way to start, but to continuously build confidence in your system, it’s advisable to run your chaos experiments continuously.

Scaling the Blast Radius

Chaos engineering isn’t about causing outages, but about learning how your system behaves under failure. Hence, you need to follow a granular approach when injecting failures. This means injecting a small failure, examining the system output and impacts of the failure, and noting your observations. If there are no observations, increase the chaos and, consequently, the blast radius. By scaling the blast radius, you can further identify system failures that relate to real-life system behaviors.

Running Chaos Engineering Experiments

Production systems are bound to fail, but chaos engineering helps you develop applications that can cope with unexpected events and inevitable disasters.

Below are the steps to follow to effectively run a chaos engineering experiment.

Formulate a hypothesis.

To successfully run chaos experiments, you need to make some realistic assumptions about how your system will behave when it encounters unexpected events or failures. The best way to develop your hypothesis is to discuss how the app should react to unexpected changes with all those involved in its development and operation.

You can kick off the brainstorming session by asking several “what if” questions and allowing everyone on the development, support engineering, and operations teams to come up with several scenarios that could affect your system’s steady state. By sitting with your team and whiteboarding your dependencies (external and internal), data stores and services, you can create a picture of what could go wrong in your system.

Inject realistic failures and bugs.

Your chaos experiments should reflect likely and realistic scenarios. Injecting real failures during your experiments will help you get a good sense of what technologies and processes need an upgrade. For instance, you can proactively inject events that correspond to realistic software failures (like malformed messages and responses), hardware failures (like server crashes or scaling events), or non-failure events (like traffic spikes).

Measure the impact.

To fully comprehend how your system behaves under stress or the changes in its steady state behavior when it encounters a bug, you need to analyze your experiment’s outcome on the system. You should measure the impact of the failures on key performance metrics that correlate to customer success. Examples would be requests per second, orders per minute, or stream starts per second.

Verify or disprove your hypothesis.

After running chaos experiments, you’ll either discover a problem that needs to be fixed, or verify that your system is resilient to your injected failure. Both of these outcomes are good; they will increase your confidence in the entire system’s capabilities, or uncover problems that you need to remediate before they cause an outage in production.

Since chaos engineering is mostly about formulating a hypothesis and then verifying or disapproving it, if you obtain as many details as you can about your system, you can make predictions based on known vulnerabilities.

Integrating Chaos Engineering into CI/CD

Even though automated CI/CD pipelines enable fast product iterations, provide standardized feedback loops for developers and reduce the chances of manual errors, they can’t predict all of an application’s failure modes. Therefore, organizations need innovative solutions that help them discover an application’s vulnerabilities and understand how it performs when a component(s) is affected at build-time. This is where chaos engineering intersects with DevOps.

By integrating chaos engineering into CI/CD pipelines, you can build better antifragile applications and ensure that reliability is baked into every component of your system. When you break things on purpose and test how a system works under stress, you can detect application failures and fix them before they cause a costly outage. This will also lead to fewer repeat incidents, faster mean time in response to high-severity incidents, improved system design, and the development of more resilient systems.

Netflix has already integrated chaos engineering into their CI/CD pipelines. The company developed ChAP (Chaos Automation Platform) to overcome the limitations of FIT (failure injection testing) and increase the pace, breadth and safety of their experimentation. They use FIT to build more resilient systems by propagating failures across the entirety of their system, in a controlled and consistent way.

At a high level, ChAP automates experiments and interrogates the Netflix deployment pipeline for a user-specific service, launches both the control and experimental groups of that service, and routes a little traffic to each group.

If the results exceed a predetermined error budget or threshold, ChAP will end the automated experiment to prevent catastrophic damage. Netflix also integrated ChAP with Spinnaker, an open source CI/CD platform built by Netflix and supported by Oracle, Microsoft and Google. This allows engineering teams to run experiments continuously, using ChAP to identify unexpected interactions, CPU-intensive fallbacks, and mistuned retry policies between load balancers and circuit breakers.

Microsoft also uses automated fault injection techniques and chaos engineering principles to increase confidence and resilience in the applications they deliver to customers, the products they ship, and the services they make available to developers.

Ultimately, the need to integrate chaos engineering into CI/CD pipelines will only grow as customers rely increasingly on functional systems, threats become more sophisticated, and room for error shrinks. By using chaos engineering and fault injection, developers can measure, understand and improve application resilience. Architects can build confidence in their designs, and operations teams can also validate new data centers and hardware before they roll them out for customers.

Thundra Chaos Injection Feature

Using chaos engineering, developers and engineering teams can build distributed business-critical or high-availability systems.

Thundra uses chaos injection to incorporate chaos engineering into services. This feature allows you to proactively inject failures into your applications to simulate your system’s failures and see how they affect your system. Thundra gives you tools to run chaos engineering experiments on modern architectures and test your architecture’s resilience even before any issue occurs. Thundra currently supports chaos injection in Python, Node.js, and Java.

List of Keywords users find our article on Google:

you have been kicked due to unexpected client behavior

spinnaker pipeline

bad food habits wikipedia

blast engineer jobs

simian productions

chaos

antifragile examples

chaos group wikipedia

a little chaos wikipedia

event management software for radius

physical properties of food wikipedia

spikes wikipedia

confidence wikipedia

integrated dna technologies careers

how edtech can help cultivate reading habit

thundra

configurar radius

you have been kicked for unexpected client behavior

your story hour cds

radius health pipeline

high radius consulting

chaos bound

spinnaker support

spinnaker ci

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.