Hi folks,
Just thought I'd post in an alternative approach which guarantees the correct dates, and also provides a row-versioning metric. For the record - this approach is optimized for processing the data during OLTP processes, rather than ETL. In doing so, it makes the ETL process much simpler (just load anything that hasn't already been loaded).
I am not a fan of including an end-date in a type-II SCD table unless it has a definitive value - I generally make Date-Effective-To (or Active-To in my example below) a NULL until a value has been applied. Filtering for NULL in this column effectively replaces the search for an "Is Current" flag.
Anyway - without any majorly complex group-by operations or CTEs, I present my solution for implementing SCD-type II dimension processing using a simple trigger. This trigger structure will work cleanly against any number of columns, but it also has some logic which prevents ID collisions and allows accurate processing of a number of rows simultaneously. This logic helps to minimize the overhead of processing lots of atomic transactions against a
The RowVer field allows you link fact table entries (or the transactional prototype tables from which fact tables are defined) to not just the ID, but the version of the record associated with that ID, which is really the point of Type II SCDs - linking facts to the record versions at a particular point in time. I'm not saying this solution is perfect, but it enforces all of the constraints required and has been tested over multiple concurrent queries to verify the ID collision prevention. Feedback on further fine-tuning is welcome. Without further ado - here it is:
=========================================================================
IF (SELECT OBJECT_ID('Test')) IS NOT NULL
DROP TABLE test
GO
CREATE TABLE test
(
[Id] BIGINT NULL,
[RowVer] INT NOT NULL DEFAULT @@DBTS,
[ActiveFrom] DATETIME2 NOT NULL DEFAULT getdate(),
[ActiveTo] DATETIME2 NULL,
[Descriptor] nvarchar(200) NOT NULL UNIQUE CLUSTERED
)
GO
CREATE TRIGGER test_trigger ON test
INSTEAD OF INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON
SELECT
ROW_NUMBER() OVER (ORDER BY i.[Id], d.[Id]) as RowNumber,
CASE
WHEN i.[Id] IS NULL AND d.[Id] IS NULL THEN 'I'
WHEN i.[Id] IS NOT NULL THEN 'U'
WHEN i.[Id] IS NULL AND d.[Id] IS NOT NULL THEN 'D'
END as Operation,
i.[Id] as [InsertedID],
d.[Id] as [DeletedID],
i.[RowVer] as [InsertedRowVer],
d.[RowVer] as [DeletedRowVer],
i.[Descriptor] as [InsertedDescriptor],
d.[Descriptor] as [DeletedDescriptor]
INTO #RowsToProcess
FROM
inserted i FULL JOIN deleted d
ON i.[ID] = d.[Id]
DECLARE @RowToProcess int = 1;
DECLARE @RowCount int = (SELECT COUNT(*) FROM #RowsToProcess)
DECLARE @CurrentVersion INT
DECLARE @CurrentID BIGINT
DECLARE @Operation char(1)
DECLARE @maxid INT
DECLARE @CurrentDescriptor nvarchar(200)
SELECT * INTO #INS FROM test
WHERE 1 = 0;
WHILE (@RowToProcess <= @RowCount)
BEGIN
SELECT
@Operation = Operation,
@CurrentID = COALESCE (InsertedID, DeletedID, NULL),
@CurrentVersion = COALESCE([DeletedRowVer], 0),
@CurrentDescriptor = COALESCE([InsertedDescriptor], [DeletedDescriptor])
FROM #RowsToProcess
WHERE RowNumber = @RowToProcess;
IF @Operation = 'I'
BEGIN
SELECT @maxid = MAX(Id) + 1 FROM test WITH (UPDLOCK, HOLDLOCK);
PRINT CAST(@maxid as nvarchar(10));
INSERT INTO #ins SELECT * FROM inserted WHERE Descriptor = @CurrentDescriptor;
UPDATE #ins SET Id = @MaxId, RowVer = 1
WHERE Descriptor = @CurrentDescriptor
INSERT INTO test
SELECT * FROM #ins;
DELETE FROM #ins;
END
ELSE IF @Operation = 'U'
BEGIN
UPDATE test
SET ActiveTo = getdate()
WHERE Id = @CurrentID
AND ActiveTo IS NULL;
INSERT INTO #ins SELECT * FROM inserted WHERE Id = @CurrentID;
UPDATE #Ins
SET
ActiveFrom = getdate(),
RowVer = @CurrentVersion + 1
WHERE Descriptor = @CurrentDescriptor;
INSERT INTO test
SELECT * FROM #ins
WHERE Id = @CurrentID;
DELETE FROM #Ins;
END
ELSE If @Operation = 'D'
BEGIN
UPDATE test
SET
ActiveTo = GetDate()
WHERE Id = @CurrentID
AND ActiveTo IS NULL
END
SET @RowToProcess = @RowToProcess + 1;
END
GO
=========================================================================
Some demo data to show off the capabilities of the approach.
=========================================================================
INSERT INTO test (Id, Descriptor)
VALUES (1, 'Bob'), (2, 'Mary'), (3, 'Marianne'), (4, 'Peter'), (5, 'Alison'), (6, 'Jane');
GO
SELECT * FROM test;
INSERT INTO test (Descriptor) VALUES ('Brian'), ('Arthur'), ('Albert');
UPDATE Test
SET Descriptor = REPLACE (Descriptor, 'Mar', 'Cal') WHERE Descriptor Like 'Mar%';
DELETE FROM test
WHERE Descriptor IN ('Mary', 'Bob', 'Brian');
SELECT * FROM test;
GO