Garbage data in, garbage information out: Big data or big garbage?
Do you know the computer technology saying, garbage data in results in garbage information out?
In other words even with the best algorithms and hardware, bad, junk or garbage data put in results in garbage information delivered. Of course, you might have data analysis and cleaning software to look for, find and remove bad or garbage data, however that's for a different post on another day.
If garbage data in equals garbage information out, does garbage big data in result in big garbage out?
I'm sure my sales and marketing friends or their surrogates will jump at the opportunity to tell me why and how big data is the solution to the decades old garbage data in problem.
Likewise they will probably tell me big data is the solution to problems that have not even occurred or been discovered yet, yeah right.
However garbage data does not discriminate or show preference towards big data or little data, in fact it can infiltrate all types of data and systems.
Lets shift gears from big and little data to how all of that information is protected, backed up, replicated, copied for HA, BC, DR, compliance, regulatory or other reasons. I wonder how much garbage data is really out there and many garbage backups, snapshots, replication or other copies of data exist? Sounds like a good reason to modernize data protection.
If we don't know where the garbage data is, how can we know if there is a garbage copy of the data for protection on some other tape, disk or cloud. That also means plenty of garbage data to compact (e.g. compress and dedupe) to cut its data footprint impact particular with tough economic times.
Does this mean then that the cloud is the new destination for garbage data in different shapes or forms, from online primary to back up and archive?
Does that then make the cloud the new virtual garbage dump for big and little data?
Hmm, I think I need to empty my desktop trash bin and email deleted items among other digital house keeping chores now.
On the other hand, just had a thought about orphaned data and orphaned storage, however lets leave those sleeping dogs lay where they rest for now.
- » AWS’ contribution to Elasticsearch may only further entrench the open source vendor and cloud war
- » Enterprise demand for agile, data-centric architectures: The next wave of big data and analytics
- » Google Cloud launches new cloud storage plan to give enterprises more scalability options
- » Cloudera looks to being a true multi-cloud home and calls out Amazon as primary competitor