AWS Outage

April 27th, 2011 | by Santo | tech

Apr
27

The recent Amazon Web Services outage was one of the most severe ones. It lasted for upwards of 3 days from some companies/applications that were delivering on top of the same.

Some of the prominent services / applications partially or fully affected by it were

  • Quora (www.quora.com)
  • Reddit (www.reddit.com)
  • Heroku (www.heroku.com)

Surprisingly services like Netflix (www.netflix.com) which are heavily dependent on AWS saw low or no reported outage.

Heroku(check http://status.heroku.com/incident/151)  maintained a great status page for this incident and has provided a lot of insights on both how the problems faced and how they tackled it. Heroku’s escalation procedures and decision matrix are absolutely exemplary.

Some observations

  • AWS still remains one of the most robust Cloud Services providers. Their last critical outage was in June 2008.
  • Services like EC2 where restored in few hours.
  • Organisations which choose “Multi-Region Redundancy” over “Multi-Zone Availibility”  e.g. NetFlix did not face any issues
  • Dependence on specific AWS services like EBS caused a longer outage from some providers.
  • Amazon needs to be more transperant about the internal architecture it is using to provide its services to shore up consumer confidence.
  • Having your own organisational cloud DR (Disaster Recovery) plan in place is absolutely necessary.
  • If you are looking at porting a Critical application/service to the cloud then a “Multi” Cloud approach would be very prudent.

References

CATEGORY: Tech

Leave a Comment

Make sure you enter the * required information where indicated.

You must be logged in to post a comment.