Quantcast
Channel: Debunking Kimball Effective Dates
Viewing all articles
Browse latest Browse all 83

re: Debunking Kimball Effective Dates

$
0
0

I will second the suggestion that this technique is not suitable for cases where dimensions are expired when no facts arrive (monthly snapshot paradigm, for example).  In this case, there is no place to put the end date, and no expired flag.  There can be a debate about what the dimensional behavior should be of course, and why one would want all the dimensions which did not arrive to be expired.

When the technique can be used, although there is less information to update during a dimension processing step (INSERT only), it would seem that identifying the current dimension during the ETL to do the comparison to decide to insert or re-use the lookup would be significantly slower, since this view would need to be used (although I assert it would rarely be used in reporting and analysis).  So are we optimizing for an uncommon problem and hurting the performance of our regular ETL?

Whilst I have used similar design techniques for OLTP systems which had complete temporal consistency where new versioned entries are always created and old ones simply hang around, I have not used it in a reporting-user facing traditional data warehouse.

In addition, the Kimball methodology will still work with loads which are loaded in incorrect orders (reloads of a day which turned out to be bad, etc.) - although the dates will get screwy, whereas I'm not so sure what the behavior would be here since there's no independence of the current flag and the dates.  While most DW are loaded back to front, I have not seen a case in our environment where the DW was rolled back and then loaded forward to preserve the dimensional behavior at all costs - only the bad fact data was usually purged in the middle and reloaded with the regular package - dimensional behavior was not accounted for, nor was any attempt made to ensure that the dimension chosen was one which would have been active.  Instead, a regular dimension cycle was allowed to occur with whatever the active dimensions were.

I'm not 100% in agreement with the design of the DW I have to use every day, but I think I would continue to use both date columns and a current flag.  Now we don't use datetime columns - we use foreign keys into date dimension (with the YYYYMMDD natural representation of the dimension IDs which Kimball discusses), and this technique generally enhances the performance of most date-only columns.

What this whole debate highlights to me is that right now there still is no one single best methodology in data warehousing and all the best practices we have are not all in agreement.


Viewing all articles
Browse latest Browse all 83

Trending Articles