I read too much, and that, my friends, is an entirely separate topic for a blog post. But I thought I’d share with you a little more about what I’m reading because sometimes, if I’m lucky, it might be something you’d enjoy too.
So I’m going to start sharing what I’m reading at least once per week, partly so that I don’t firehose too many reading links directly into your brain (where I to do it say once per month) and partly to solidify in my own mind the information that I’m reviewing. So here are a few good links for the seven days leading up to July 22, 2001:
Microsoft and Whitehouse partnership on BigData: BigData isn’t a particularly new concept. But I was intrigued to learn that the National Science Foundation, Microsoft, and 13 other teams were partnering on developing better BigData analytics for lots of government data from activities such as healthcare, economic development, education, transportation, and the power grid. Cools stuff! Plus, Microsoft has developed a new tool called Project Daytona to better harness the power of the cloud, in general, and Windows Azure, specifically.
While we’re on the topic of Federal IT in the Cloud be sure to read this linked article from ComputerWorld. Say what you will about our government, but putting government IT in the cloud and increasing both its transparency and availability will make a huge difference in how the Federal government will be able to service the public. We’re talking as big a difference as corporations experienced between the “catalog on the web” experience of the 1990′s to the Web2.0 experience of today.
And if you think Microsoft is still towing the relational database barge without thinking about other technologies, you need to read up on Projects Dryad and Daytona.
E.T.L. That’s Extract – Transform – Load. That doesn’t sound like a lot of work when all you need to get loaded is a simple Access database or an Excel spreadsheet. In a situation like that, the process is so simple, all you really need to focus on is the L in ETL. There’s not a whole lot of E.T. to process, despite how wonderful that movie is. [pun intended] But as soon as your data loading process involves some difficult or sophisticated cleansing or transformations, it gets really, really hard.
The other cross-thread that had really caught my interest lately is the USA federal governments Open Data Initiative. I think it’s remarkable that President Obama is the first president to appoint a federal CIO. (Shouldn’t that have happened in the past?) In addition, President Obama instructed the entire executive branch to open up their data (where security isn’t at risk) and make it readily available to the public. And the US government collects mountains of interesting and valuable data for its own uses, but figuring out how or who to share it with was always an afterthought. While I was a contractor for NASA, for example, I worked on some incredibly interesting projects which yielded amazing and commercially valuable information. It was all public domain. But unless you knew it was there, you couldn’t get to it. Making use of all of that data always intrigued me.
Now, with ODI, it’s all being put on the internet at an ever-increasing rate at Data.gov. However, all of this data, while open and available, is not standardized. Some data sets might be a CSV file, while others might be something like a spreadsheet. That means you’ll need to extract, transform, and load that data if you want to synthesize more valuable data sets.
For those reasons, I’ve been researching tools to help make this process easier. (I also wanted to research SSIS and ETL tools for my Tool Time column in SQL Server Magazine.) Now, I’ve been following expressor software for quite some time and really like their unique approach. (I actually ran into the expressor software team at a PASS Summit one or two years ago and asked for a demo of their software. And I really liked what I saw.) Rather than the workflow approach used by SSIS, expressor software uses a data mapping approach combined with reusable business rules. Their mapping approach is fundamentally different from the traditional point-to-point, source-to-target mappings paradigm. Basically, you can define a semantic type representative of your business data, create a business rule(s) to apply to the data, and then implement a “canonical” mapping which connects data sources and targets to that same semantic type. And it’s free!
Abstraction is Awesome
What’s cool about that? Don’t forget that “semantic” means “meaning”. So a semantic type is an abstraction of the meaning of the data. The net result is that expressor shields your data integration application, with its associated business and transformation rules, from changes that might occur to underlying target or source files with different field names and data type representations have to be processed.
For example, let’s assume that you need to process invoices from different vendors in slightly different formats. If you use a traditional ETL tool like SSIS, any changes in the source and/or target formats will require you to modify your data mappings and transformation rules, because the mappings are tied directly to the metadata structure of the invoice file format(s). expressor, on the other hand, lets you define a common “invoice” semantic type, build all your downstream data processing off that type and map one or multiple invoice file schemas to the type.
This approach greatly simplifies the mapping process and provides for more flexible data integration applications that can be more easily adapted to changes in the source and target data sources.
expressor Studio Desktop
Benefits Abound
Since the semantic types in expressor are captured as reusable artifacts, you can also reuse them again in new data flows within your project(s). You can even share them across your entire organization. As I tinkered with the expressor Studio tool, I hit on a few other benefits with this approach:
Handles data type conversions automatically without having to write data transformation rules for these conversions
Builds new semantic types from existing types and reuses types in existing and new applications
Creates multiple, reusable business rules against a single type and applies them repeatedly as needed
Easily implements data quality rules and constraints
In an Ideal World…
In an ideal world, I’d figure out some brilliant way to make money from bringing together all kinds of that government data that I used to work with. Other folks are doing it at the Windows Azure Data Market. But in the meanwhile, I’m also looking forward to tinkering with this data to build better demos. Along the way, I’m going to use the expressor Studio desktop ETL tool (Did I mention that it’s free!) as well as tell you about my experiences as I try to build out some Data.gov data sets.
Those of you who know me, know that I look a good discussion and cooperative, constructive team work. So I encourage your feedback and suggestions, as I work through these data integration challenges and share my experiences. I’m looking forward to sharing with you my insights on what the expressor data integration software can do with this challenge and what some of its features and capabilities are. In upcoming releases, I’ll let you know what I find intriguing and worth mentioning.
I first wrote about Mladen Prajdic’s excellent tool in my Tool Time column at SQL Server Magazine HERE. The tool is a nice plug-in to SSMS and definitely worth having. If you’ve never installed it or have only installed an older version, but sure to pick up the newest release. Here’s Mladen’s press release complete with hyperlink for the tool: SSMS Tools Pack 1.9.4 is out! Now with SQL Server 2011 (Denali) CTP1 support.
As Mladen says:
…this release adds support for SQL Server 2011 (Denali) CTP1 and fixes a few bugs. Because of the new SSMS shell in SQL 2011 CTP1 the SSMS Tools Pack 1.9.4 doesn’t have regions and debug sections functionality for now. The fixed bugs are: A bug that prevented to create insert statements for a database A bug that didn’t script commas as decimal points correctly for non US settings….
There are so many great tools out there for data professionals using Microsoft SQL Server. I really like to see all of these great tools made free to the public. On the other hand, I’m bummed that the tools are cast about in a very decentralized fashion. If you haven’t done migrations before, you might want to start with these good white papers first.
Here are a hand full of cool migration tools worth mentioning:
SQL Server Migration Assistant (SSMA) for Oracle: Migrate from Oracle to SQL Server 2005, SQL Server 2008 or SQL Server 2008 R2. I’m thinking about installing it on my SQL Servers even without even needing to migrate existing Oracle databases to SQL Server. Why? Well as an old Oracle hand, I came to really enjoy quite a few Oracle PL/SQL system packages (kind’a like a SQL Server system stored procedure, but often more powerful). As it turns out SSMA-Oracle includes stored procedures, extended stored procedures, and CLR routines that reproduce the functionality in most all of the cool and powerful Oracle packages like DBMS_PIPES. It’d be nice to have those on my SQL Servers just because I know them and like them.
Microsoft Services for Mission Critical Customers: Many enterprise customers running mission critical applications on SQL Server have asked for more – more service and support for their environments. This is an add-on that costs extra, but it’s worth it for those running the systems that keep the company in business.
If you’ve tried any of these tools out, I’m keen to hear your experiences. Did they work well for you? Did they work, though poorly? Did they fail utterly? Inquiring minds want to know.
I profiled Adam Machanic’s (blog | twitter) excellent stored procedure, SP_WHOISACTIVE, back in August of 2010 in my monthly SQLMag column, Tool Time. Adam has been diligent about maintaining the tool and adding new features. Read the details on my SQLMag Tool Time column.
Another tool I really like is PowerWF. PowerWF is a really cool visual workflow builder that creates PowerShell scripts for you. Turns out that it integrates with PowerGUI!
This video shows 2 different ways that PowerWF Workflows can be run from within Quest Software’s PowerGUI tool.
I was just bragging about how Toad for SQL Server keeps getting better. In that post, I also pointed out a lot of great resources you can put to work immediately on improving your skills with this great tool. (Incidentally, there’s a freeware version without all of the features, but it’s still quite useful. And you can always use the beta product, if you want all of the features and many new features that are undergoing community testing.)
Ain't he handsome?
One of the reasons that Toad is so good is that it’s always been a community-driven product. Back when I used Oracle every day, TOAD was an acronym = Total Oracle Application Development. It didn’t take long for Toad to rise above the acronym transform into the eponymous term denoting “kick-butt database tool” just a few years before Toad began to go cross-platform. Now that Toad is solidly cross-platform with versions for DB2, MySQL, and Cloud to boot, it’s worth pointing out that Toad got to be what it is today entirely from community feedback. Back in the day, when I worked in Quest’s R&D team, the developers literally kept a checklist of cool suggestions from the community and worked against that to develop new features. My point isn’t to fully describe the inner workings of the Toad dev team, rather I wanted to highlight how incredibly important community feedback is to this tool and the developers behind it.
Product documentation and product training are two areas where our customers consistently press us to improve…and one we take if very seriously.
Bold Claims
One of my favorite tools in the Quest Software toolbox for SQL Server (and Oracle) DBAs is called Foglight Performance Analysis, or more commonly, PA. This product can do things that no other tool or amount of customized scripts can ever reproduce. I am dead serious about this claim.
Here you’ll find just about any and all documents you could possibly need, from initial evaluation, through the demo and proof-of-concept (POC) phase, and on through implementation and on-going management. Do we have more documentation? Sure, but this list contains the key documents you’ll most likely want to see.
Training?
I’ve also gotten a lot of questions about training on the Quest tools – Do we offer it? How much does it cost? When do the classes run?
The quick answer is YES! We offer very nice training for a mere $350. Head over to www.quest.com/foglight-performance-analysis-for-sql-server and you’ll see a link to “Find out about Technical Training” that links to http://www.quest.com/sql-training-leadthem/. Once you register, you’ll get to take part in two 2-hour fully remote offerings. The first class is focused on sizing, configuration, and setup of PA, while the second teaches you how to use the product.
Toad Data Analyst, the Reporting Tool for non-technical types
If you’ve attended any of my public sessions about SQL Server technology, then you might remember that I extend a standing offer to provide a free, long-term license to any of several products from Quest Software, such as Toad for SQL Server (including the SQL Optimizer), Toad Data Analyst, Toad Data Modeler, and the awesome performance and scalability testing tool Benchmark Factory.
If you’ve ever wondered about these tools and why I tout them, why don’t you take a couple minutes to look at the on-line demos available at each of the preceding links? If you like what you see, drop me a note and I’ll get you that license I was blabbing about. I thank you and my children thank you! <grin>