Monday, September 20, 2004

Metadata-Driven Data Warehouse

The topic of the metadata-driven (MD) data warehouse - along with the Common Warehouse Metamodel (CWM) - subject of code generation & logic rule systems fascinates me...

Having working in corporate environments where the core business I was involved in was data validation & data warehouses, I can't believe more effort hasn't been directed to this area. The reason for my focus on this is that I am currently involved in delivering (in the 3 weeks I have left!) a MD code generation system which will use SQL code behind the scenes to build a relatively simple data warehouse with basic ETL steps... Why code generation for the data warehouse?

First, I'll start by stating that I believe there are 3 symptoms that you can diagnose early to determine if code generation should be used on a project:
Repeated logic - the same (or very similar) thing happening to different stuff everywhere. Hard coded logic rules - the description of "what" we do to our data isn't abstracted from the way the system actually works unconstrained conceptual entities - Things that are the conceptually same in two places but represented either differently or separately, don't share a relationship.

So an initial approach of specifying user requirements, designing the data warehouse and building it iteratively was not as appealing to me this time around. Having been involved in a few data warehouse projects I have come to learn that the initial design is generally flawed and needs modification in order to be supported into the future. This approach is generally focused at the schema of the data warehouse. Spec the needs, design the schemas, build the schemas, and tack on some ETL logic to make it work. As almost a rule, it doesn't, but that's OK. That's what iterative development is all about. But in this case, the way the system works is already spliced into the schema of how we want the data presented to the user... This means that if we need to fundamentally change the way we treat our dimensions, we have to change reams of code and try not to break anything... and plenty of other nasties.

At the system level, a typical system design goal is usually something like this - "Create a data warehouse that allows users to get access to data collected by the corporate financial data application..." The unfortunate temptation is to go off and spec a corporate financial data warehouse. But - what is "corporate financial"? The conceptual answer (eg the column names, data items, business rules etc) usually surfaces - Corporate financial is "Year","Month","Revenue" etc. So in many cases, the coding is also done at the conceptual level. The real answer to our question is that Corporate Financial is a data collection - that can be easily described with metadata as a collection of inputs, outputs, definitions, relationships, procedures and rules.

It is about at this point that the whole thing turns sour - by ignoring the basic facts about the metadata of what we want to present, we lock ourselves into certain way of doing something, which can be a nightmare to develop, debug and deploy.

The common result of this solution is that the project is never 'finished'. The end of the financial year rolls around, and the users want new items in the data collection... It's up to the developers and DBA's to hack the scripts to get them through. What kind of development team wants to be doing code maintenance ad infinitum? That detracts from time spent developing new stuff. The goal should be to pass the bulk of the logical & conceptual maintenance tasks back to the application owners.

The MD approach generally takes longer that seat-of-your-pants coding, but delivers long term gain. It should also be noted that MD doesn't fix all bugs; it just limits their occurrence to bad modeling decisions, rather than coding errors.

This type of approach lends and extends from many others - CWM, Configuration Management, Virtual Machines, Rule-Based programming and many others. This stuff isn't rocket science, nor is it a new concept, but it seems many architects are still avoiding MD solutions

In my next post I'll talk about a few simple ways to implement MD design in different applications.

Sunday, September 19, 2004

Idle Weekend

This was most likely the last idle weekend i'll have before the moving activities really kick in. I spent it mosting doing reading and research on the web; that included catching up on some episodes of the .Net show and MSDN TV that I have been neglecting lately... also managed to wash the car and the bike. Watched two of the best games of footy I've seen all year, St Kilda v Port Adelaide and Brisbane v Geelong on friday/saturday night respectively.

Sunday was fantastic - sunny and clear in the afternoon, so Geoff my father in law and I went on an impromptu ride up to Kinglake and down through Yarra Glen. We got stuck behind a fair few slow cars on the way up the hill, and the same on the way down... still, a good way to waste an afternoon. The old girl (Yamaha SR250) did a respectable job, but I still can't wait to get rid of it and get something with a bit more grunt. Contemplating an Aprilia 125 or 250; can't see myselfa l really needing the power of itre sports bike (although 600 would be nice) given the huge cost of insurance, the fact that I won't be doing any track days, and that I generally respect speed limits.

The most likely situation is that the old girl will be running for a few years yet...

Wednesday, September 15, 2004

Bargain of the century...

I made my regular jaunt to the bookstore today, where I usually browse the bargain bins for any gems that are hidden away once a week or so. I don't spend a heap of money on books; I will buy the odd new one for full price if it is a real gem, or if it is pertinent to a solution I am working on or researching. I tend to buy the discounted books that are still relevant...

Anyway, I walked into the basement level to find the bargain bin full of some fantastic tomes from the embattled Wrox label.

almost $1000 rrp for these seven books - a steal at $10 each! Here is a pic of those useful titles...

Apparently the stock that was discounted didn't last... and word is there were a few scraps over the remaining few...

Monday, September 13, 2004

Melbourne Weather

Amanda and I took this weekend to go down to Blairgowrie where we planned on having what will most likely be our last relaxing weekend before we move. We stayed in a house belonging to a family friend, which was great apart from trying to get the woodheater going at 9pm on friday night (no central heating).

Well, to say it rained this weekend would be an understatement. When it wasn't raining, it was hailing, and vice versa. We did get down to the beach on Saturday afternoon; we lasted about 5 minutes on the shore before being buffeted by Antarctic winds proved too much...

Tip: Don't holiday anywhere in Victoria in September or October.

Wednesday, September 08, 2004

Movin on up...

It is official - Amanda and I are moving to Queensland...

After much deliberation and careful planning we made the decision to pack up everything we own and head north. Amanda has landed a job with a local company on the Gold Coast, so this isn't a flight of fancy by any means, although I am looking forward to the warmer weather (how long until I start cursing it?!).

The things I will miss most here in Melbourne are my family and my beloved cricket club. Not having many contacts up there I will be starting out from square one. I am considering staying in the Health IT industry, but I am keen to see the view from other areas.

Also looking for a new cricket club up there...

Monday, September 06, 2004

CODE - by Charles Petzold

I am officially a huge fan of Charles Petzold.

I have just bought his book, CODE, which I heard about thanks to rave reviews from Rory and Carl on DNR.

I don't splash out on a lot of books, but this one looks so good that I can give it to my kids to read, and it will still be relevant.

Blog tools

So I had a go at setting up .TEXT on my pc at home (running Windows XP SP2) hosted through my 1.5mbs Optusnet Cable connection. The thing was, Optusnet have blocked port 80 due to all the worms that were (and probably still are!) floating around the windows world. So I had to fire up my site on a different port (I used 8080) which is a pain to link to as we all know.

The bottom line was, .TEXT was a bit of a pain to get up and running. Actually, even after about 20 goes, I still hadn't got it going (maybe it was port 8080?). Just as I was about to download the source version of .TEXT and prepare to hack away, I thought better of it and went with Blogger.
(To Scott W - .text does rock though!)

Very happy so far, I initially started off with the intention of hosting the blog on my FTP server, but again I thought better of it. Why not have all my blogs managed, DRPed and up 24/7.

And it's free... which was nice

Let there be light

I made a new-financial-years-resolution to start blogging... Here it is albeit late...

I promise to include at least one useful and insightful post at least once on this blog. Promise!