Quantcast
Viewing all articles
Browse latest Browse all 83

re: Debunking Kimball Effective Dates

Hi folks,

Just thought I'd post in an alternative approach which guarantees the correct dates, and also provides a row-versioning metric.  For the record - this approach is optimized for processing the data during OLTP processes, rather than ETL.  In doing so, it makes the ETL process much simpler (just load anything that hasn't already been loaded).  

I am not a fan of including an end-date in a type-II SCD table unless it has a definitive value - I generally make Date-Effective-To (or Active-To in my example below) a NULL until a value has been applied.  Filtering for NULL in this column effectively replaces the search for an "Is Current" flag.

Anyway - without any majorly complex group-by operations or CTEs, I present my solution for implementing SCD-type II dimension processing using a simple trigger.  This trigger structure will work cleanly against any number of columns, but it also has some logic which prevents ID collisions and allows accurate processing of a number of rows simultaneously.  This logic helps to minimize the overhead of processing lots of atomic transactions against a

The RowVer field allows you link fact table entries (or the transactional prototype tables from which fact tables are defined) to not just the ID, but the version of the record associated with that ID, which is really the point of Type II SCDs - linking facts to the record versions at a particular point in time.  I'm not saying this solution is perfect, but it enforces all of the constraints required and has been tested over multiple concurrent queries to verify the ID collision prevention.  Feedback on further fine-tuning is welcome.  Without further ado - here it is:

=========================================================================

IF (SELECT OBJECT_ID('Test')) IS NOT NULL

DROP TABLE test

GO

CREATE TABLE test

(

[Id] BIGINT NULL,

[RowVer] INT NOT NULL DEFAULT @@DBTS,

[ActiveFrom] DATETIME2 NOT NULL DEFAULT getdate(),

[ActiveTo] DATETIME2 NULL,

[Descriptor] nvarchar(200) NOT NULL UNIQUE CLUSTERED

)

GO

CREATE TRIGGER test_trigger ON test

INSTEAD OF INSERT, UPDATE, DELETE

AS

SET NOCOUNT ON

SELECT

       ROW_NUMBER() OVER (ORDER BY i.[Id], d.[Id]) as RowNumber,

       CASE

           WHEN i.[Id] IS NULL AND d.[Id] IS NULL THEN 'I'

           WHEN i.[Id] IS NOT NULL THEN 'U'

           WHEN i.[Id] IS NULL AND d.[Id] IS NOT NULL THEN 'D'

       END as Operation,

       i.[Id] as [InsertedID],

       d.[Id] as [DeletedID],

       i.[RowVer] as [InsertedRowVer],

       d.[RowVer] as [DeletedRowVer],

       i.[Descriptor] as [InsertedDescriptor],

       d.[Descriptor] as [DeletedDescriptor]

   INTO #RowsToProcess

   FROM

       inserted i FULL JOIN deleted d

           ON i.[ID] = d.[Id]

   DECLARE @RowToProcess int = 1;

   DECLARE @RowCount int = (SELECT COUNT(*) FROM #RowsToProcess)

   DECLARE @CurrentVersion INT

   DECLARE @CurrentID BIGINT

DECLARE @Operation char(1)

DECLARE @maxid INT

DECLARE @CurrentDescriptor nvarchar(200)

SELECT * INTO #INS FROM test

WHERE 1 = 0;

   WHILE (@RowToProcess <= @RowCount)

BEGIN

SELECT

@Operation = Operation,

@CurrentID = COALESCE (InsertedID, DeletedID, NULL),

@CurrentVersion = COALESCE([DeletedRowVer], 0),

@CurrentDescriptor = COALESCE([InsertedDescriptor], [DeletedDescriptor])

FROM #RowsToProcess

WHERE RowNumber = @RowToProcess;

IF @Operation = 'I'

BEGIN

SELECT @maxid = MAX(Id) + 1 FROM test WITH (UPDLOCK, HOLDLOCK);

PRINT CAST(@maxid as nvarchar(10));

INSERT INTO #ins SELECT * FROM inserted WHERE Descriptor = @CurrentDescriptor;

UPDATE #ins SET Id = @MaxId, RowVer = 1

WHERE Descriptor = @CurrentDescriptor

INSERT INTO test

SELECT * FROM #ins;

DELETE FROM #ins;

END

ELSE IF @Operation = 'U'

BEGIN

UPDATE test

SET ActiveTo = getdate()

WHERE Id = @CurrentID

AND ActiveTo IS NULL;

INSERT INTO #ins SELECT * FROM inserted WHERE Id = @CurrentID;

UPDATE #Ins

SET

ActiveFrom = getdate(),

RowVer = @CurrentVersion + 1

WHERE Descriptor = @CurrentDescriptor;

INSERT INTO test

SELECT * FROM #ins

WHERE Id = @CurrentID;

DELETE FROM #Ins;

END

ELSE If @Operation = 'D'

BEGIN

UPDATE test

SET

ActiveTo = GetDate()

WHERE Id = @CurrentID

AND ActiveTo IS NULL

END

SET @RowToProcess = @RowToProcess + 1;

END

GO

=========================================================================

Some demo data to show off the capabilities of the approach.

=========================================================================

INSERT INTO test (Id, Descriptor)

VALUES (1, 'Bob'), (2, 'Mary'), (3, 'Marianne'), (4, 'Peter'), (5, 'Alison'), (6, 'Jane');

GO

SELECT * FROM test;

INSERT INTO test (Descriptor) VALUES ('Brian'), ('Arthur'), ('Albert');

UPDATE Test

SET Descriptor = REPLACE (Descriptor, 'Mar', 'Cal') WHERE Descriptor Like 'Mar%';

DELETE FROM test

WHERE Descriptor IN ('Mary', 'Bob', 'Brian');

SELECT * FROM test;

GO


Viewing all articles
Browse latest Browse all 83

Trending Articles