ETL Demo with Data From Data.Gov

A little over a month ago, I wrote an article (Is There Such a Thing as Easy ETL) about expressor software and their desktop ETL application, expressor Studio.  I wrote about how it seemed much easier than the native ETL tools in SQL Server when I was reading up on the tool, but that the “proof would be in the pudding” so to speak when I actually tried it out loading some free (and incredibly useful) data from the US federal data clearinghouse, Data.Gov.

If you’d rather not read my entire previous article – quick recap, expressor Studio uses “semantic types” to manage and abstract mappings between sources and targets. In essence, these types are used for describing data in terms that humans can understand—instead of describing data in terms that computers can understand. The idea of semantic abstraction is quite intriguing and it gave me an excuse to use data from data.gov to build a quick demo. You can download the complete data set I used from the following location: International Statistics.  (Note: I have this dream that I’m going to someday download all of this free statistical data sets, build a bunch of amazing and high-value analytics, and make a mint.  If, instead, YOU do all of those things, then please pay to send at least one of my seven kids to college in repayment for the inspiration.  I’m not kidding.  I have SEVEN kids. God help me).

The federal government, to their credit, has made great progress in making data available.  However, there is a big difference between accessing data and understanding data. When I first looked at one of the data files I downloaded, I figured it was going to take me years to decrypt the field names. Luckily, I did notice an Excel file with field names and descriptions. Seriously, there are single letter field names in these files where the field name “G” has a description of “Age group indicator” (Oh Wow).  See the figure below.

It’s stuff like this that reminds me why we have data quality and master data management tools.  Ok, back to expressor Studio. I quickly mapped a couple of files into expressor Studio using their “Read File” operator. It was fairly simple and easy to use. My data included files with country area information, population, and gender information by year. Once I mapped these files I quickly wanted to shed the default cryptic, nay, nonsensical names. I could have just renamed the fields when I initially mapped them into the system but that would mean I would have to manage the names in three separate locations. Bah! It made more sense to create a common semantic type and reuse it across all three files.

There are two flavors of semantic types within expressor Studio to handle your mappings, atomic types or composite types. An atomic type is simply a single field name whereas a composite type is a combination of one more atomic types. Since the data files had many common fields, I decided to create a core set of atomic types that I could then roll up into composite types based on the files I was mapping. This kept the mappings simple and easy to understand and most importantly the whole exercise took about 5 minutes. Once the types were created I simply mapped the cryptic names from the files to the business friendly names in my semantic type.  (I can’t even begin to imagine how long this would’ve taken to build using native tools, but certainly not 5 minutes).

Now I was ready to move my data. I took the data from three files and combined them into one master dataset. From there, my international statistics from Data.Gov were pumped right into my waiting SQL Server database.  Note that I could’ve used Excel or just about any other database as my target instead of SQL Server.

Now, you might be saying to yourself “That looks easy because you read all the help files first.”  Actually, no.  In fact, some of my buddies like to lovingly tell me to “RTFM” from time to time.  It’s not that it offends my masculinity to read a manual.  I just usually like to have a go first and then, if needed, go back to the manual.  In fact, all I really used was this 5-minute demo video that in noticed when I was downloading the tool.

If you’re tackling ETL and you want it fast and easy, then you might want to check out their website, www.expressor-software.com, to learn more about the expressor company and products.

Enjoy!

-Kev

P.S. Follow me on Twitter!

 

 

Speak Your Mind

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.