Email management provider Mimecast has experienced a UK-based outage due to a hardware network failure today, with customers being temporarily unable to send or receive emails.
The first reports of irregularity occurred at approximately 1100GMT, with Mimecast acknowledging the problem at 1215.
Around forty minutes later, the company further tweeted that a fix was on the way purportedly on a tower-by-tower basis, with the majority of servers back up and running at the time of printing.
Yet the outage itself does not explain the full story. Mimecast, for better or worse, promises a 100% uptime service level agreement (SLA) – which now is torn to shreds after today’s unfortunate difficulties.
To rub further salt into the wound, a blog post dated May 8 from Orlando Scott-Cowley concerning Postini Services notes: “Any downtime can be expensive and disruptive.
“That is why Mimecast offers a 100% uptime SLA to our customers, because the difference between the ‘nines’ becomes critical when an email outage hits. Something that Google customers are finding out the hard way.”
As is the way of these things, Twitter was abuzz with a healthy amount of fear, uncertainty and doubt as the issue broke. Some were angry:
Some were morose:
Whilst others were more philosophical:
In this situation it may be wise to follow Fennell’s advice and wait for the dust to settle. But regardless, it certainly opens the can of worms regarding the age-old SLA argument.
What are the next steps from here?
So how can a company genuinely offer 100% uptime, if at all? Matthew Finnie, CTO of Interoute, speaking at the Apps World event in 2012, told CloudTech that a 100% SLA was “nonsense”, although noting that it could be eventually available “through a combination of multiple zones”.
A study from the International Working Group on Cloud Computing Resiliency (IWGCR) last year found that, on average, cloud services were only running at around a 99.917% availability, whilst Brandon Wade, CEO of dating website WhatsYourPrice.com, proclaimed that 100% uptime was essential for his business having dumped AWS following the widely-publicised outages last year.
One other element which needs to be considered is just how many customers chose Mimecast based on its 100% availability promise, and what the practicalities are for both the company and the cloud industry in general after the furore has died down.
“You can have cloud running for a year without an outage or issue, but have one incident and customers rapidly forget all the good! Get it wrong in the cloud and the impact can be widespread rapidly.
“Cloud continues to be more effective and reliable in most cases than on network solutions. When an on-network system falls over at a customer it gets no publicity, when a cloud system does so it gets massive coverage due to its impact on a wider range of clients who through crowd sourcing can share their thoughts on social media.”
Moyse, who also sits on the board of Eurocloud UK and the Governance Board of the Cloud industry forum, also warned about a potential knock-on effect for Mimecast, adding: “A large portion of their customers are big legal firms who rely on immediacy and always available email.
“This will not be a good day for the staff in their office as can be seen from the very public rebuttals of customers on Twitter for the past 3+ hours.”
The company has since apologised and posted a best practice blog updating customers of the situation and giving advice on how to restore full email functionality.
But what’s your opinion? How will this affect people’s perception of cloud computing, and what is the way forward for SLA dialogue?