Netflix, a large AWS customer, described in a blog post (http://techblog.netflix.com/2010/12…g-aws.html) some of the lessons they had learned developing using the AWS services. One of those lesson is that they use process called a Chaos Monkey (read below).
I want a T-Shirt design of the Chaos Monkey. Use your imagination what that should look like... a badass monkey is OK as well as something more abstract.
You can call it/him/her the Chaos Monkey, AWS Chaos Monkey or Amazon Chaos Monkey, all is fine
Netflix' description of the Chaos Monkey:
3. The best way to avoid failure is to fail constantly.
We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.
If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.
One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage