|
Return to Newsletter Contents...
Using the SSIS Slowly Changing Dimension Transformation
by: Sean Beal, Sr. Consultant, CRCP
Newly available in SQL Server Integration Services (SSIS)
is the Slowly Changing Dimension Transformation.
There are several types of slowly changing dimensions; let’s take a look
at using this transformation in the context of a Type 2 slowly changing
dimension.
With a Type 2 slowly changing dimension, we track not only
the current state of the record in question, but also historical instances by
adding a new record each time that there is change in an attribute (or group of
attributes) which we consider important.
Consider the following employee entity (our data source):

We’ve decided that in our data warehouse it is important to
track the fact that a particular employee happened to change their address.
We want to keep the original record and add a new record with the ‘fresh’
information. We also want to keep
track of which record is the current record in order to make later query work a
little easier. Of course, we’ll have
to add a surrogate key to our destination entity, because our ‘employee_id’ will
be duplicated every time we get updated data.
Our destination employee schema might look like this:

Let’s dive in!
We’ll create a new Integration Services Project, and drag a Data Flow Task onto
our Control Flow workspace. Double
clicking the Data Flow Task (or selecting it and clicking on the Data Flow tab)
will allow us to add the Slowly Changing Dimension transformation to our Data
Flow workspace. Of course, the
transformation will need an input source, so we’ll need to configure our data
source as well. In this case, the
data source will be our employee entity from the source system for employee
data, and we’ll use an OLE DB Source.
After configuring the OLE DB source and connecting it to the Slowly
Changing Dimension transformation, we should see something like this:

Now, double click on the Slowly Changing Dimension
transform, and we’ll be presented with the Slowly Changing Dimension Wizard.
The first step in the Wizard is to connect to our dimension
table (our destination). We’re going
to use the employee entity in our data warehouse, shown above.
We also need to choose a business key.
The business key is typically the primary key in the source system; in
our case, the business key will be the ‘employee_id’ field from our source
system.
The next step in the
Wizard is to set up the change types for our columns.
This is how we tell the transformation on a column by column basis what
to do with changes. The Wizard
presents us with three different change types from which to choose:
Fixed Attribute
– select this when the column value should not change.
Any changes detected in this column will be treated as an error.
Changing Attribute
– select this when changing values should overwrite existing values.
We’d choose this if we were doing a Type 1 slowly changing dimension.
Historical Attribute
– select this when changes to a column are saved as new records; previous
records are marked as outdated. This
is a Type 2 Slowly Changing dimension.
We’ll set each of our columns to the Historical Attribute
change type. On the next step of the
Wizard, we can choose the column we want to use to show which records are
current. In our case, we’ll use the
‘is_current’ column, and because it is a Boolean field, we’ll choose ‘True’ for
our value when current and ‘False’ for our expiration value.
We could also use start and end date columns to show when a record is
active or expired.
The next step of the Wizard allows us to configure what are
called Inferred Dimension Members.
These are used when a fact table is referencing dimension members that haven’t
yet been loaded. We won’t be using
this in our example, so let’s disable it.
We’re finished!
The Wizard will now indicate the outputs that will be created.
Because we are configured this transformation for a Type 2 Slowly
Changing Dimension, we should only have two:
one for the new records, and one for records that have changed.
Select ‘Finish’ and we should see something like this:

Let’s test it out.
For testing purposes, insert 3 rows of data in our source table.
On the first pass, there will be no employee records at all in the
destination (warehouse) table…all records are new.
Run the package, and we should see the following:

Success! Three
rows were pushed through the ‘New Output’, and a quick query shows that we have
3 records in our destination table, each with the original ‘employee_id’ and a
new surrogate key called ‘employee_key’.
By the way, the destination table is configured with an identity key; in
large data volume situations, this may not be ideal, but that is beyond the
scope of this article.
Now let’s modify one of the employee records in the source
system. Run an update statement to
change one of the employee records, modifying the address, city, and state
columns, then run the package again and you should see this:

Success again!
Another quick query confirms that not only has another record with the new
values been inserted into the table, but our ‘is_current’ flag is updated as
well.
The SSIS Slowly Changing Dimension transformation can be a
big help when dealing with Type 1 and Type 2 slowly changing dimensions.
I hope this quick introduction will help you get started using it in your
data transformation work.
Go to Top |
Return to Newsletter Contents
|