In last month’s column, “2012 Might Really Be the End of the World as We Know It,” I described a number of major developments in the IT industry that are likely to disrupt the life of database professionals everywhere. I categorize those four disruptors – virtualization, cloud computing, solid state drives (SSD), and advanced multi-core CPUs – into two broad groups. I’m going to continue an analysis of these disruptive technologies in inverse order. Today, let’s discuss SSDs.
In this podcast on the uber-popular podcast “RunAs Radio”, host Richard Campbell asks me about what topics are of particular interest at Tech-Ed 2011, focusing the discussion on cloud and SQL Server “Denali”
You can download the MP3 version of the podcast or the transcript here.
I have to confess that I’m incredibly excited about BigData. I haven’t been this excited about new innovations in IT since relational databases first appeared on the scene early in my career. But what is BigData?
Back in those days, I can still feel the echos of adrenaline when I was hired to work on a NASA project that would involve over 100Mb of data. ONE HUNDRED MEGABYTES! Good grief, that was fantastically huge to us on the team. (That database was over 130Mb when I finally moved on to another project). And remember – PC software was installed using 640Kb floppy disks at the time. In fact, my Oracle v5 instance required shuffling through about a dozen floppy disks to get the thing installed on a 286 IBM PC.
BigData today takes on an entirely meaning as database sizes scale into the petabytes. But the emphasis is still the same today as it was back in the 1980′s – turning data into actionable information. However, with BigData, we can achieve amazing new insight from this data and mine for tidbits that would never have seen the light of day with smaller data sets.
The two major themes to remember about big data are 1) the more data you have on a given domain, the more power you have, 2) the better the analysis you can perform on the data, the more power you have. In fact, theme 2 might be the most important thing to consider because lots of data is meaningless unless you can extract knowledge from it. And that’s where better analytical techniques come into play.
Here are some articles about Big Data that you might enjoy:
It’s always interesting to see the guestimations of the big brains about figures and facts that are hard to verify. Here’s an example – how much data is computerized today? I’m not talking about ancient stuff, like the Codex Synaticus (which, incidentally IS on-line at www.codexsinaiticus.org). I’m talking about the new and really important stuff, like the fourteen pictures that my step-daughter posted on her FaceBook account from our recent trip to Rock City.
Well, IDC figured that overall digital data was up to 1.2Zb (Zetabytes!) at the end of 2010. My mind is boggling. Ok, so that’s only 1.2 trillion gigabytes! Doctor Evil, please put your pinky to your mouth and say this huge number . . .
1,319,413,953,436 Gb
Another way to say it is that it’s about 1,228 Exabytes.
You can get other numbers by extrapolating from storage purchases from the major storage vendors. Of course, not all of their storage sold is actually filled up right away. But it’s still an interesting number to hear. So just on scuttlebutt from a friend of a friend of a friend I heard numbers like this:
Online data back in 2002? around 5 Exabytes
Online data expected in 2011: around 700 Exabytes
And, again we’re surmising these values based on published storage sales from various vendors, this data growth is hurtling along at ridiculous speed, with data doubling every fifteen months or so. Who knows where this will take us, but if we assume a constant rate of data growth (which is a bad bet, IMO), we’ll have 996,000 Exabytes of data online by 2020. Hey, but that’s 8 years after the Mayan calendar, and the world along with it, is supposed to end, right?
If you spend any time at all reading IT trade journals and websites, you’ve no doubt heard about the NoSQL movement. In a nutshell, NoSQL databases (also called post-relational databases) are a variety of loosely grouped means of storing data without requiring the SQL language. Of course, we’ve had non-relational databases far longer than we’ve had actual relational databases. Anyone who’s used products like IBM’s Lotus Notes can point to a popular non-relational database. However, part and parcel of the NoSQL movement is the idea that the data repositories can horizontally scale with ease, since they’re used as the underpinnings of a website. For that reason, NoSQL is strongly associated with web applications, since websites have a history of starting small and going “viral,” exhibiting explosive growth after word gets out. [READ MORE]
I was once asked what I thought Microsoft’s overall product trajectory for SQL Server was, in light of Oracle’s rather obvious trajectory of acquiring multiple application vendors who will, in turn, deploy more and more of their applications to the Oracle database platform. To be honest, I had a little difficulty perceiving a clear and concise strategy statement for the sort of work going on in Redmond. I could see a lot of great features being developed. And I knew the SQL Server development team had developed a lot of new “plumbing” with each new release – features like Service Broker and Extended Events and exponentially more robust capabilities in the Analysis Services product lines. But the strategy itself was veiled and, since Microsoft wasn’t explicitly telling us what the grand strategy was, I had difficulty putting my finger on it. [READ MORE]
In a two-part article over the next two months, I’m going to address an important issue for the SQL Server community: the future direction of coding for SQL Server, as directed by Microsoft. I’ll start by telling you a bit about the current situation with writing code on and for SQL Server, and, in the next installment, talk more about the ramifications brought on by the current coding environment.
I’m curious if you agree with my assertions. You also have the added advantage of hindsight, since I wrote these a while ago.
It doesn’t seem like it was that long ago that my company’s IT department was bracing for a major new line of work. Back in the mid 1990s, we were going full steam into client-server technology. At the same time, we were significantly expanding our workforce. The IT department that had spent years as an old-style mainframe shop, was suddenly inundated with requests for new workstations, network user IDs, new network domains, permission requests, and requests for application access privileges. Our lone mainframe permissions person quickly felt overwhelmed and a little baffled by all of these new privileges and provisioning needs. Within a year or two of our first client-server application, we went from one to three staffers working full-time granting access to the various applications and network resources within our environment. [READ MORE]
Moore’s Law tells us that CPU’s get a LOT faster over time. Unfortunately for the database professional, all of the secondary elements of our databases DO NOT get a lot faster over time. Overall, the main methods of storing data since the 1960′s, magnetic tape and hard disks, have improved only in the single percentiles year over year. Even those of us who were never good at math can tell that the CPU is outpacing the other system components.
An Osborne Executive portable computer, from 1982, and an iPhone, released 2007. The Executive weighs 100 times as much, has nearly 500 times the volume, cost 10 times as much, and has a 100th the processing power of the iPhone.
Two recent developments are helping to change that equation. First, solid state drives (SSDs) are having a dramatic impact many IT scenarios. My friends, Brent Ozar and Paul Randall, have each written about SSDs here and here, respectively.
Second, database vendors are supporting relational database systems that run entirely in system RAM. If you’d like to learn more about in-memory databases (IMDB), read more in my new article in Data Management Magazine. As we look to the future, I expect to see a lot more of both technologies in the data center.
Just wanted to let you know that a TechNet Radio episode and interview I did about cloud computing is now live on TechNet Edge. It was the featured spot on Thursday, June 3rd and is also featured on the TechNet homepage.
I’ve been trying to wear more of an analyst’s hat these days, so this webcast has a lot of my “deep thinking” on issues related to cloud computing – hopefully at a higher level of quality that Jack Handy.
A salient point that I think many analysts are overlooking is the changing nature of data as it exists in the cloud. For decades, data has primarily been about people (and their activities) for consumption by other people. The cloud is enabling a major shift in data generation and consumption where data is produced by machines for consumption by other machines. We’ll soon be looking at situations, now rather rare, in which sensors are extremely commonplace. These sensors, whether they be in traffic signals or high-end medical devices, will create enormous amounts of data far more frequently than ever before, loading that data directly into cloud databases. The cloud databases will consume and process the data and, when automated analysis (made all the easier through features like StreamInsight in SQL Server 2008 R2) will flag important findings for review by a real-live human being. Check out the interview for several real-world examples being played out even as we speak.
Perhaps I can persuade you to blog, tweet, or place a link to it in your Facebook or team newsletter? Maybe with a few deep thoughts? Please? Pretty please?
And I welcome your deep thoughts and responses here.