Bring out yer dead!

Let it begin. The “I told you so’s” and the “AWS isn’t so great now is it?”. LinkedIn members, Facebook users and Google news are all taking part in trashing AWS as aggressively as they can. Memes, simple jokes and even plain hate are spewing over a single incident, granted it was widespread and severe, but was quickly, succinctly and transparently resolved. Let’s keep that last part in mind for later.

Bring out your dead!

So what was the underlying technical issue? AWS’ S3 (simple storage solution) stopped serving up it’s content to customers and ultimately stopped servicing any internal AWS capabilities tied to the S3 service, rendering a significant portion of AWS services inoperable. The specific technical details are still hidden, as are most Intellectual Property by any company, but we know that it was tied to error handling within S3 and that it was resolved within the same day that it was found. Pretty impressive for any enterprise if you ask me. Can your IT department make the same claim for any P1S1 that they’ve had as of late?

Transparency - The New IT Service Trend

Here’s where I think AWS did things 100% correctly. Like the GitLab Debacle, AWS chose to keep customers up to date with new findings and to not sweep the issue under the rug. This doesn’t solve the issue itself, but we now find ourselves in an honest position of risk management. How many times has an IT provider, whether it’s an online service or hardware vendor, answered a ticket with “We’ll get back to you shortly” even though the issue is severe? How many times did you as a customer have to re-engage support to keep your tickets moving? We in the IT world tend to be fickle with our memories in terms of issues like these, it would do us well to remember that issues not nearly this severe tend to not get resolved as quickly as we saw with S3. Even more so, IT departments tend to have to drive their technology partners to resolution, which seems to be a backwards arrangement.

I’m not making light of this situation, or at least not attempting to. Many critical services around the world utilize AWS as their go to IT resource, so incidents such as this have to be taken seriously. With that being said, I think it’s also unfair to place the entire onus on AWS. The IT community, especially in any enterprise workspace, has part of the ownership of any issue, especially if it could have been resolved by building or working with a partner to build a properly resilient and redundant cloud environment. In some part, anyone that places their data that is deemed critical in a public cloud is as responsible as the cloud provider themselves during an issue like this.

It’s only when you don’t learn do you truly fail.

So, what next? Did David get exposed for being fragile? Is this the AWS Achilles heel? I don’t think so. If anything, I think it’s going to put to light the incapability of traditional IT customers who utilize new cloud technologies and architectures. Ask yourself, did Netflix go down? Nope. A company know for advanced deployment of their service found a way to design, using Amazon, in order to provide a redundant and resilient solution. As with any design, these measures should be taken up front, and the risk will be reduced.

AWS Resiliency

So what are we to learn? AWS (along with any cloud infrastructure provider) is not an easy undertaking. Being able to architect a solution around these providers is something that is a shift from traditional IT deployment methodologies, and traditional skill sets aren’t meeting the standard that is needed to deploy on these technologies. I’ve also learned that any exposure of issues with new technologies are being slung around worse than normal FUD in our industry since AWS is the common enemy to all traditional infrastructure companies.

On-prem services will never die, similar to the fact that the Mainframe is still here. On-prem has it’s use case, as does the cloud. Those who are fighting for a single solution to win will find themselves as the only losers of this game.

Matt Hoyt is a current IT Architect in Columbus, OH. All thoughts displayed here are his and his alone.