What’s the Data Modeling Standard for Business Intelligence Systems?

I had the good fortune to overhear some of my good friends and fellow MVPs discussing the process of data modeling for business intelligence systems.  So what are the industry standard approaches for modeling dimensional data modelling?
The short answer is that Kimball’s dimensional modeling is the most widely adopted standard for any sort of data warehouse. Inmon is also well respected.
But the consensus is to stick with Kimball, especially if you’re looking for something an enterprise standards team is going to recognize as “standard”. The book on my shelf and which I most frequently recommend is The Data Warehouse Toolkit: Complete Guide to Dimensional Modeling

http://www.amazon.com/The-Data-Warehouse-Toolkit-Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&qid=1342453858&sr=8-1&keywords=data+warehouse+toolkit

There’s a version of this book specifically for the Microsoft platform, focusing on SQL Server Analysis Services (SSAS). But I recommend the generic one because the platform-specific version compresses some of the general concept sections to make room for the Microsoft-specific content. The Kimball book does a good job describing how important it is to get the conformed dimensions and the precise measures most needed for the BI application. Otoh, the platform-specific version does point out some nice tools available for download from Microsoft’s website.  Fortunately, you can just go the “Tools & Utilities” tab from this page:   http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/microsoft-data-warehouse-dw-toolkit/.
Of course, if you’re using a data modeling tool, the specific notation and design patterns might vary a little bit because the tool offers only a given set of workflows or symbols.  Also, some industries have already mapped out specific pattern data models along with six or seven industry standard figure data models, many of which are free if you know where to look.  [Note: I don’t know where to look. So if you do, please post a comment with this important insight!] You might, for example, apply the set of common patterns made popular in financial BI apps, in which you have a staging/ETL area, data marts, and a data warehouse and then carefully measure how quickly and reliably data reaches the user. After all, BI is much more than just the cubes, reporting, dashboards, and event subscriptions of an SSAS/SSRS/SSIS implementation.  It must be useful for and usable by the end-users.
So, I’m curious – what standards patterns and notations are you using? What data modeling tools are influencing your design?  Have you taken advantage of the various free industry patterns out there?  Inquiring minds want to know!  Post your comment here.  And, as always, thanks!

-Kevin
-Follow me on Twitter!
-Google Author

P.S. A special thanks to Bob Duffy (Blog), Davide Mauri (Blog), Robert Pearl (Twitter | Blog), Audrey Hammonds (Twitter | Blog), Karen Lopez (Twitter | Blog), Thomas Ivarsson (Twitter | Blog), Chistian Cote (Twitter | Blog), and Dr. Greg Low (Twitter | Blog) for letting me eavesdrop on their very informative conversation! Be sure to read their blogs and follow them on Twitter.

Comments

  1. The third edition (2013) of the Kimball DImensional Modelling Toolkit should be available soon as well. Not sure what will be different though.

    http://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802/ref=sr_1_1?s=books&ie=UTF8&qid=1372790947&sr=1-1&keywords=data+warehouse+toolkit

  2. Great point, Andy! Thanks for the feedback. -Kev

  3. Thomas Ivarsson says

    Hi,
    http://www.kimballgroup.com both have a long list of design tips and an interesting discussion Group. I can recoomend “Master Data Warehouse FDesign” by Claudia Imhoff and others.

    The most comment design pattern is a stage database, the data warehouse database and a publish database(also called a data mart). The data mart mostly use the starschema as the model.

    My primary data modelling tool are whiteboards, pen and paper,snd sticky notes on White boards, MS Word figures and occasionally Visio.

  4. Hitesh Shah says

    My 2 cents .

    From the glaring success of Qlikview and my own experiences at our shop , I feel that even without DW & cubes good ,easy to deploy and use BI can be done by
    1. Copying summary data / KPI / measures which is not even 1% of overall data say daily to another place.
    2. Showing the summary kpi to users
    3. Showing details and transactional level info runtime from runtime database.

  5. This is always an interesting discussion, and learning by reading what others do (and don’t do) regarding data modelling in BI is important.

    I find it interesting that in my experience (and maybe more so locally here than internationally), the BI-projects and BI-community has not fully adapted the need for and use of standard notations and patterns as known in the general system development community. A lot of BI professionals have different backgrounds than IT professionals (economics, management, etc.), and the common ground when it comes to communicating data needs simply aren’t there. This has been a challenge in the communication between data modelling professionals / solution architects and customers since forever, but this might be an even bigger challenge within BI.

    I tend to end up with something close to the Kimball dimensional model representing the final data model, but how we get there is highly dependent on the project, the customer, their background and the organization in general. I usually start out with a set of high level use cases written in collaboration with the customer to illustrate the final goal and use of the BI-system. I say “use cases”, but it could just as well be high level user stories. From there we move towards a simple conceptual model representing the most important terms with their internal relationships. The customers is challenged to define these concepts unambiguously without me mixing in the concepts of facts, dimensions, measures etc. at this point. Then we iterate over the concepts and refine it closer to the standard model untill we get it to the satisfactory level of detail. I’m a bit old school when it comes to emphasizing the importance of data modelling and getting this right. It’s expensive to correct these errors later on.

    And the models visualizing the processes is, of course, a whole different topic, both behavior and dependencies need some extra attention.

  6. Thomas and Hitesh, great insight! Thanks for sharing.

    Tone, hi to you up in Norway. =^) Your comment is full of good context and nuance. It’s a bit of a shame that management and the executive suite always think that a quick binary choice is available. So much of the time, the best way for us to tackle a problem is to use a few elements of several approaches, as you mention, to deliver the best solution for the customer. Good comment!

Speak Your Mind

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.