Temporal Tables in SQL Server 2016

A Simple Question

One of the most frequent questions that comes up in the realm of enterprise applications is:

Who has touched/modified this data?

Short questions usually have long answers, and this one is no exception. To understand the implications of this question, one has to understand the principle of temporal versioning. This essentially means that time is treated as a dimension over which changes to data are persisted and observed. A full discussion on this topic can be found here. This post identifies a new feature in SQL Server 2016 called temporal tables and contrasts it with the manual process of creating and managing temporal tables in previous versions of SQL Server. The following example provides a basis for further discussion which in turn answers the question.

Imagine a table with the following schema:

CREATE TABLE [dbo].[Orders]

(

    [Id] [int] IDENTITY(1,1) NOT NULL, 

    [Amount] [money] NOT NULL, 

    [Comments] [nvarchar](4000) NULL, 

    [LastModifiedBy] [nvarchar](255) NOT NULL, 

    CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([Id] ASC) 

)

A simple orders table with Id as the primary key, Amount to track the money spent on an order, a Comments column to track qualitative details about the order, and a LastModifiedBy column to track who modified it.

In the process of consuming this table, imagine that a row was inserted:

INSERT INTO [dbo].Orders ([Amount], [Comments], [LastModifiedBy])

VALUES (5.32, 'Tall Caramel Macchiato', 'bob.blue@example.com')

If we ask the question who modified this data, the answer is simple: the user bob.blue@example.com was the person who last touched this row. Now assume that this row was inserted with an error and that the amount needs to be updated from 5.32 to 4.89. Also assume that the correction was completed by a different user as follows:

UPDATE [dbo].[Orders]

SET [Amount] = 4.89, [LastModifiedBy] = 'alice.green@example.com'

WHERE [Id] = 1 -- 1 being the ID of theorder row we just inserted.

If we ask the same question again, the answer from the system's perspective is simple: alice.green@example.com is the user that last touched this row. While the answer from the system's perspective is simple, it is also incomplete. More often than not, due to compliance reasons, businesses not only want to know who modified the data the last time, but also who modified it since it was inserted into the table.

A naive implementation would be to simply attach a timestamp column with well-defined semantics of how to determine valid rows versus invalid ones. But in the process of doing this, we cannot maintain Id as a primary key because we end up violating the uniqueness constraint. Including the timestamp column in the primary key is also just as bad because it confounds the process of maintaining foreign key relationships against the table.

So what is the solution then?

Prior to SQL Server 2016

In earlier versions of SQL Server, the solution was to create a table with a similar schema that was identified as a history table. The following schema defines a history table that corresponds to the Orders table that we established earlier:

CREATE TABLE [dbo].[Orders_History]

(

    [Id] [int] NOT NULL,

    [Amount] [money] NOT NULL,

    [Comments] [nvarchar](4000) NULL,

    [LastModifiedBy] [nvarchar](255) NOT NULL,

    [ValidFrom] [datetime2] NOT NULL,

    [ValidTo] [datetime2] NOT NULL

)

Note that Id is not a primary key and that this schema has two additional columns ValidFrom and ValidTo. These additional columns help identify the validity of the row over time. I have omitted additional DML triggers that would have to be written against the Orders table. However, the entire process would work as follows:

When a row is inserted into the Orders table, a row would also be inserted into for the Orders_History table with the same values for Id, Amount, Comments, and LastModifiedBy. Furthermore, ValidFrom would have a value of the time of insertion into the Orders table and ValidTo would have the value of EndOfTime (a symbolic value that indicates that this row is still valid, you can use a value like December 31st 9999 23:59:59).
If a row is modified in the Orders table, a row in the Orders_History table with a matching primary key, that is still valid based on the ValidTo value, is expired by updating its ValidTo value to the time of when the update occurred on the Orders table. After this, step 1 is repeated for the Orders_History table with the new values of the updated row.
If a row is deleted in the Orders table, we would perform the same procedure as step 2, except the ValidTo value would be updated to the time of deletion from the Orders table.

So for the series of events described earlier (just an insert and update on the Orders table), we would have to do something like this via triggers:

-- Initial insert

INSERT INTO [dbo].Orders_History 

(

    [Id],

    [Amount],

    [Comments],

    [LastModifiedBy], 

    [ValidFrom], 

    [ValidTo]

)

VALUES 

(

    1,

    5.32, 

    'Tall Caramel Macchiato', 

    'bob.blue@example.com', 

    GETDATE(), 

    '9999.12.31 23:59:59'

)

-- Update

DECLARE @now datetime2 = GETDATE()



UPDATE [dbo].Orders_History

SET [ValidTo] = @now

WHERE [Id] = 1 AND [ValidTo] = '9999.12.31 23:59:59'



INSERT INTO [dbo].Orders_History 

(

    [Id], 

    [Amount],

    [Comments],

    [LastModifiedBy],

    [ValidFrom], 

    [ValidTo]

)

VALUES 

(

    1, 

    4.89, 

    'Tall Caramel Macchiato', 

    'alice.green@example.com', 

    @now, 

    '9999.12.31 23:59:59'

)

At this point, if we are asked the same question again, we have the ability to dive into the history table and obtain information about all users that have modified a certain row of data in the OLTP Orders table.

While this feature allows us to provide a richer experience for end users of an enterprise application, it is non-trivial, very tedious, and stressful to manage especially if you're trying to evolve your schema.

SQL Server 2016 CTP 2

As of SQL Server 2016 CTP 2, all this has been taken care of for us. We now have the ability to define a history table without directly having to manage the schema of the history table (even though we have the option / flexibility to do so). This in turn allows us to work with OLTP tables / data transparently and develop richer features more efficiently.

For the Orders table we can simply create the table as follows:

CREATE TABLE [dbo].[Orders]

(

    [Id] [int] IDENTITY(1,1) NOT NULL,

    [Amount] [money] NOT NULL,

    [Comments] [nvarchar](4000) NULL,

    [LastModifiedBy] [nvarchar](255) NOT NULL,

    [ValidFrom] [datetime2] GENERATED ALWAYS AS ROW START NOT NULL,

    [ValidTo] [datetime2] GENERATED ALWAYS AS ROW END NOT NULL

    PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)

    CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([Id] ASC)

)

WITH (SYSTEM_VERSIONING = ON)

That's it! No complicated triggers or managing history tables independently. You can essentially set up a history table with 3 extra lines of code as shown below, and that is extremely powerful!

...

    [ValidFrom] [datetime2] GENERATED ALWAYS AS ROW START NOT NULL,

    [ValidTo] [datetime2] GENERATED ALWAYS AS ROW END NOT NULL

    PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)

...

For more information on how you can use this feature to your advantage, see Getting Started with System-Versioned Temporal Tables.

The Path Forward

With this new feature, it's going to be so much easier to report on historical artifacts in database, from an audit perspective or a statistical perspective. One last note about the temporal feature is the general size of the database. Even though this feature is meant to capture data by slowly changing dimensions (SCD), slowly for any given businesses is a relative term, and database sizes can grow to be very large relatively quickly. While there is not much on the topic yet, I suspect that Stretch Database, another feature of SQL Server 2016, will play a key role in mitigating the size of databases with large history tables.