AWS Outage

Contributed by Santo on 27 Apr 2011

The recent Amazon Web Services outage was one of the most severe ones. It lasted for upwards of 3 days from some companies/applications that were delivering on top of the same.

Some of the prominent services / applications partially or fully affected by it were

Quora (www.quora.com)
Reddit (www.reddit.com)
Heroku (www.heroku.com)

Surprisingly services like Netflix (www.netflix.com) which are heavily dependent on AWS saw low or no reported outage.

Heroku(check http://status.heroku.com/incident/151) maintained a great status page for this incident and has provided a lot of insights on both how the problems faced and how they tackled it. Heroku’s escalation procedures and decision matrix are absolutely exemplary.

Some observations

AWS still remains one of the most robust Cloud Services providers. Their last critical outage was in June 2008.
Services like EC2 where restored in few hours.
Organisations which choose “Multi-Region Redundancy” over “Multi-Zone Availibility” e.g. NetFlix did not face any issues
Dependence on specific AWS services like EBS caused a longer outage from some providers.
Amazon needs to be more transperant about the internal architecture it is using to provide its services to shore up consumer confidence.
Having your own organisational cloud DR (Disaster Recovery) plan in place is absolutely necessary.
If you are looking at porting a Critical application/service to the cloud then a “Multi” Cloud approach would be very prudent.

References

Visit us at Neevtech.com to know more about our offerings.

Tags: aws, cloud computing, EBS, Outage, Quora, Reddit, RightSize

Top recommendations

Search Neevtech

Categories

Archives

AWS Outage

Leave a Comment

Top recommendations

Search Neevtech

Categories

Archives