Garbage in the Cloud
On two occasions I have been asked,—”Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, 1864
The long-standing computer science principle of “garbage in, garbage out” (GIGO) is so fundamental to IT that it predates digital computing by almost a century. And yet here we are in the twenty-first century, moving to the Cloud, and Babbage’s exasperated response is no truer or more on point. For not only is the Cloud a magnet for all sorts of garbage, it is also generating new garbage at a brisk clip.
Uploading Garbage to the Cloud
In today’s frantic rush to “move to the Cloud,” too many organizations are failing to ask what they should move to the Cloud. Instead, they envision the Cloud as some kind of huge, nebulous server in the sky, a perfect receptacle for whatever they have on-premise. Got email? Put it in the Cloud! Got data? Put your data in the Cloud, the bigger the better! Running business processes on-premise? Move them to the Cloud!
Not so fast. Let’s slow down a bit and consider the ramifications of moving too quickly—and haphazardly—to the Cloud.
- Unclean data – This is the obvious example, pure GIGO. If your current on-premise data are unclean, say you have inconsistent customer demographic information, or obsolete product information, or any other data quality challenge, it goes without saying that moving such information to the Cloud won’t do your data, or your business, any good. Instead, think of moving your data to the Cloud as though you were moving your elderly parents to a condo. It’s a wonderful excuse to finally dig through the layers of detritus so that you only move data that are clean, accurate, and valuable to the business.
- Spaghetti code – you may be eyeing that old custom-coded legacy app as a prime Cloud candidate. It’s too slow, it doesn’t scale well, and it’s a bear to integrate now, so won’t the Cloud automatically make it fast, scalable, and easy to integrate? Sorry to burst your bubble. If you’re focusing on an IaaS approach, what you’ll find is that spaghetti code is every bit as intractable in the Cloud as it is on-premise. What about PaaS? Chances are that old code won’t run at all. Today’s PaaS environments expect and enforce a certain level of code quality.
- Obsolete and Cloud-unfriendly business processes – Does this sound familiar? The business asks IT to automate a set of processes, but states unequivocally that “the processes are fine the way they are. Automate them but don’t change them. After all, we’ve been doing things the same way for years. Why change now?” Yes, the business often says that, but seasoned IT veterans have long realized that the business never actually means it. When the business asks IT to touch a process, there is always at least an implied requirement to try to make it better: faster, more streamlined, better aligned with the underlying business need.
Moving business process implementations to the Cloud raises the stakes in this complex dance between business and technology, because the Cloud offers a wealth of new opportunities for improving processes. Furthermore, how users interact with Cloud-based assets is often fundamentally different from how users interact with traditional enterprise apps. Any organization that has moved from an older CRM app (or no CRM app at all) to Salesforce.com has learned this lesson first hand. But Salesforce is merely a harbinger of greater change to come. One of the main reasons Salesforce has been so successful is because they offer their clients new ways of conducting business—in other words, better processes. Any SaaS solution should build on their example.
Generating New Garbage in the Cloud
The garbage problem doesn’t end with garbage you might put in the Cloud. The Cloud also presents numerous opportunities to generate new kinds of garbage.
- Zombie instances – it’s so easy and cheap now for anyone in your organization to spawn their own Cloud instances, including virtual machines, storage instances, and more. Furthermore, such instances are elastic: need more of them? The Cloud is only too happy to oblige. But what happens when you’re done with them? You’re supposed to delete them. After all, elasticity works in both directions. All too often, however, instances that have served their purpose are left around like so much space junk. After a while, nobody remembers what they’re for or if they still have something important in them. The last thing you want to do is delete an instance with valuable data or code on it. So to play it safe, you leave it around. Forever. Your Cloud provider is only too happy to keep billing you for these Zombie instances.
- Data with no provenance – any Antiques Roadshow aficionado knows that antiques with provenance are more valuable than those without. The same goes for your data. Do you know if the data you’re working with are the latest version? Do you know they haven’t been tampered with? If not, then those data are worse than useless, since they may be incorrect, or even worse, keeping them around may violate any number of regulations. Here again the elasticity of the Cloud works against you.
- Manual or poorly abstracted configurations – Let’s say you’ve built a sophisticated Cloud app based on elastic VM instances. If you need more, simply provision more. But then let’s say some admin somewhere in your IT shop goes into one of these instances and changes a config file in order to get an app to run on that instance. Now you have no way to update your instances without breaking your app—and if that admin didn’t tell anybody about the reconfiguration, then tracking down the problem will present a time-consuming challenge.
Simply creating a static image file to generate new VM instances—and keeping rogue admins from monkeying with them—won’t solve the problem, because there is more to your app than the instances. Instead, you need a next generation configuration management approach that automates configuration for the Cloud. See Chef or Puppet for an indication where this market is going. (You can expect a ZapFlash on Cloud configuration management in the near future.)
- Cloud-unfriendly architecture choices – We covered one example of this problem in our ZapFlash The Secret to a RESTful Cloud: stateful Cloud instances. Essentially, inappropriate state information is just more garbage in the Cloud. Another example would be inappropriate transactionality in the Cloud. Cloud Computing lends itself to particular ways of architecting applications, and attempting to shoehorn the wrong architectural approach into the Cloud is about as effective as Cinderella’s stepsisters’ efforts with the glass slipper.
The ZapThink Take
How do you avoid garbage in the Cloud? Architecture is a large part of the answer, of course, but governance is equally important. Organizations should establish and enforce Cloud-centric policies as well as extending current IT governance to the Cloud. With great power comes great responsibility, and the Cloud offers enormous new power to many different roles within the IT organization. The Cloud is fraught with pitfalls. Without sufficient governance, you’re bound to fall in one.
It is also important to note that the issues in this ZapFlash apply equally to private as well as public Clouds. Organizations generally realize that public Clouds present numerous governance challenges, and look to private Clouds because they are ostensibly less risky. But such a stance offers little more than a false sense of security—one that may backfire, if organizations assume that in the absence of proper architecture and governance, a private Cloud is the better choice. Don’t wait to implement adequate Cloud governance until after you’ve run into these problems. Governance should be an integral part of any Cloud strategy, before you move to the Cloud.