It’s always interesting to see the guestimations of the big brains about figures and facts that are hard to verify. Here’s an example – how much data is computerized today? I’m not talking about ancient stuff, like the Codex Synaticus (which, incidentally IS on-line at www.codexsinaiticus.org). I’m talking about the new and really important stuff, like the fourteen pictures that my step-daughter posted on her FaceBook account from our recent trip to Rock City.
Well, IDC figured that overall digital data was up to 1.2Zb (Zetabytes!) at the end of 2010. My mind is boggling. Ok, so that’s only 1.2 trillion gigabytes! Doctor Evil, please put your pinky to your mouth and say this huge number . . .
1,319,413,953,436 Gb
Another way to say it is that it’s about 1,228 Exabytes.
You can get other numbers by extrapolating from storage purchases from the major storage vendors. Of course, not all of their storage sold is actually filled up right away. But it’s still an interesting number to hear. So just on scuttlebutt from a friend of a friend of a friend I heard numbers like this:
Online data back in 2002? around 5 Exabytes
Online data expected in 2011: around 700 Exabytes
And, again we’re surmising these values based on published storage sales from various vendors, this data growth is hurtling along at ridiculous speed, with data doubling every fifteen months or so. Who knows where this will take us, but if we assume a constant rate of data growth (which is a bad bet, IMO), we’ll have 996,000 Exabytes of data online by 2020. Hey, but that’s 8 years after the Mayan calendar, and the world along with it, is supposed to end, right?
Another tool I really like is PowerWF. PowerWF is a really cool visual workflow builder that creates PowerShell scripts for you. Turns out that it integrates with PowerGUI!
This video shows 2 different ways that PowerWF Workflows can be run from within Quest Software’s PowerGUI tool.
I would like to make you aware of a recently written paper by Bert Scalzo. The paper focuses on how DBAs can rely on the Toad and Benchmark Factory to perform database workload replays, ensuring that changes to the databases do not degrade the user experience.
I encourage you to read the paper and make workload replay a part of your database change management practices. As I’ve been saying for years, if you don’t have quantitative evidence of what normal is for your database, how can you know what is abnormal?
Last year, some of my friends from Quest Software attended Hadoop World in New York. In 2009, I never would’ve guessed that Quest would be there with products, community initiatives, as a major sponsor and with presenters?
There were just under 1,000 attendees who weren’t the typical devheads and geekasaurs you’d normally see at very techie events like Code Camps, SQL Saturdays, Cloud Camps and or even other NoSQL events such as the Cassandra Summit. We’re talkin’ enterprise customers with active Hadoop projects underway.
Some observations from the show that may be of interest to you:
- Hadoop World was a trending topic on Twitter during its duration.
- Hadoop has “arrived” with an average cluster of 66 nodes weighing in at 114TB. (For the philosophers among us, how much does a terabyte weigh?) The most famous Hadoop cluster is FaceBook with a trifling 30PB in storage – that’s petabytes. That’s more written information than has ever been written by man, cumulatively, including the Advice on Men column from Cosmo Magazine. Unfortunately, that’s only a few hundred thousand pictures of teenagers pursing their lips at themselves and holding a digital camera while standing in front of the bathroom mirror. They’re expecting about 60PB by the end of 2011.
- HP was there, creating a lot of buzz, from a hardware perspective. Quest was there as the leading independent tool maker for cloud apps.
- Oracle OraOop got attendees pulse’s racing, since many want a high speed, scalable connector between Oracle and Hadoop to fill a necessary gap. I’m not sure if there’s something in place for SQL Server and I’m not currently aware of any high-speed connectors built in to SQL Server Integration Services.
Some other good coverage to check out about the show as well:
All of this is very important because NoSQL in general and Hadoop in particular are picking up speed and momentum. Even if your organization isn’t using NoSQL technology today, chances are very good that your CIO will be asking you for details on how and when it should be deployed. And if you don’t think it should be deployed, the natural response of the CIO is “Why not?”. So you’d better get your ducks in a row, Mr SQL Server DBA.
There are lots of great sites to get Hadoop information, but I invite you to take a gander at Jeremiah Peschka’s (blog | twitter) blog for much NoSQL goodness. Start with Jeremiah’s blog post here, and ignore all indications that you might be in a biker bar or a San Francisco tattoo parlor. That’s just Jeremiah’s style.
His Hadoop writings are here, though lately he’s been writing a lot about RIAK - which sounds like a euphemism for vomiting, as in “Jeremiah spent a lot of time riaking after chugging that bottle of cough syrup.”
Compliance is one of the most interesting elements of any data management plan – it’s a microcosm of evolution in action. When many of the laws that impacted data retention were first enacted, business wasn’t collecting a lot of information. Now, data collection happens everywhere. And, as citizens have come to realize that more and more of the information about their daily lives is recorded, they demand their governments provide privacy and protection from misuse of that data. [READ MORE]
If managing your corporate data for the long term isn’t currently on your mind, it should be, and in several different ways: cost, performance, business continuity, and compliance. [READ MORE]
In this vblog entry on www.SQLServerPedia.com shows SQL Server expert Kevin Kline discussing his views on how to be both efficient and effective in your day to day and career – aimed at the SQL Server professional, but good for anyone.
One thing I really enjoy about the SQL Server community is its vibrancy. I’ll give you details on the SQL Server community’s explosive growth in a moment, but let’s start by comparing Microsoft SQL Server’s user community with those of other significant database platforms. [READ MORE]