The Result Data Newsletter   
Volume 712 - December 2007   
© Copyright 2007 Result Data Consulting, Ltd.  614-505-0770  www.resultdata.com   

    Result Data Home Page  |  Newsletter Archive  |  Upcoming Events  |  Classes & Workshops  |  Request Information
Upcoming Events:  Microsoft Data Mgt & BI Seminar: 12/14, Microsoft SQL Server User Group: 12/14

Return to Newsletter Contents...

 

Using the SSIS Slowly Changing Dimension Transformation

by: Sean Beal, Sr. Consultant, CRCP

Newly available in SQL Server Integration Services (SSIS) is the Slowly Changing Dimension Transformation.  There are several types of slowly changing dimensions; let’s take a look at using this transformation in the context of a Type 2 slowly changing dimension.

With a Type 2 slowly changing dimension, we track not only the current state of the record in question, but also historical instances by adding a new record each time that there is change in an attribute (or group of attributes) which we consider important.  Consider the following employee entity (our data source):

Pic 1

We’ve decided that in our data warehouse it is important to track the fact that a particular employee happened to change their address.  We want to keep the original record and add a new record with the ‘fresh’ information.  We also want to keep track of which record is the current record in order to make later query work a little easier.  Of course, we’ll have to add a surrogate key to our destination entity, because our ‘employee_id’ will be duplicated every time we get updated data.  Our destination employee schema might look like this:

Pic 2

Let’s dive in!  We’ll create a new Integration Services Project, and drag a Data Flow Task onto our Control Flow workspace.  Double clicking the Data Flow Task (or selecting it and clicking on the Data Flow tab) will allow us to add the Slowly Changing Dimension transformation to our Data Flow workspace.  Of course, the transformation will need an input source, so we’ll need to configure our data source as well.  In this case, the data source will be our employee entity from the source system for employee data, and we’ll use an OLE DB Source.  After configuring the OLE DB source and connecting it to the Slowly Changing Dimension transformation, we should see something like this:

Pic 3

Now, double click on the Slowly Changing Dimension transform, and we’ll be presented with the Slowly Changing Dimension Wizard.

The first step in the Wizard is to connect to our dimension table (our destination).  We’re going to use the employee entity in our data warehouse, shown above.  We also need to choose a business key.  The business key is typically the primary key in the source system; in our case, the business key will be the ‘employee_id’ field from our source system.

 The next step in the Wizard is to set up the change types for our columns.  This is how we tell the transformation on a column by column basis what to do with changes.  The Wizard presents us with three different change types from which to choose:

Fixed Attribute – select this when the column value should not change.  Any changes detected in this column will be treated as an error.

Changing Attribute – select this when changing values should overwrite existing values.  We’d choose this if we were doing a Type 1 slowly changing dimension.

Historical Attribute – select this when changes to a column are saved as new records; previous records are marked as outdated.  This is a Type 2 Slowly Changing dimension.

We’ll set each of our columns to the Historical Attribute change type.  On the next step of the Wizard, we can choose the column we want to use to show which records are current.  In our case, we’ll use the ‘is_current’ column, and because it is a Boolean field, we’ll choose ‘True’ for our value when current and ‘False’ for our expiration value.  We could also use start and end date columns to show when a record is active or expired.

The next step of the Wizard allows us to configure what are called Inferred Dimension Members.  These are used when a fact table is referencing dimension members that haven’t yet been loaded.  We won’t be using this in our example, so let’s disable it.

We’re finished!  The Wizard will now indicate the outputs that will be created.  Because we are configured this transformation for a Type 2 Slowly Changing Dimension, we should only have two:  one for the new records, and one for records that have changed.  Select ‘Finish’ and we should see something like this:

Pic 4

Let’s test it out.  For testing purposes, insert 3 rows of data in our source table.  On the first pass, there will be no employee records at all in the destination (warehouse) table…all records are new.  Run the package, and we should see the following:

Pic 5

Success!  Three rows were pushed through the ‘New Output’, and a quick query shows that we have 3 records in our destination table, each with the original ‘employee_id’ and a new surrogate key called ‘employee_key’.  By the way, the destination table is configured with an identity key; in large data volume situations, this may not be ideal, but that is beyond the scope of this article.

Now let’s modify one of the employee records in the source system.  Run an update statement to change one of the employee records, modifying the address, city, and state columns, then run the package again and you should see this:

Pic 6

Success again!  Another quick query confirms that not only has another record with the new values been inserted into the table, but our ‘is_current’ flag is updated as well.

The SSIS Slowly Changing Dimension transformation can be a big help when dealing with Type 1 and Type 2 slowly changing dimensions.  I hope this quick introduction will help you get started using it in your data transformation work. 

Go to Top  |  Return to Newsletter Contents

 

The Result Data Newsletter is published approximately once a month to share the latest information on business intelligence, data management and CRM. There should be a link below to allow you to change or remove yourself from our list. We take your requests very seriously. If you have any difficulty please contact us at 614-505-0770 and we will make sure that your request is handled properly. This is not intended to be an unsolicited message and you can reach us in person if needed.

© Copyright 2007 Result Data Consulting, Ltd. - All Rights Reserved
All trademarks and copyrights are the property of their respective owners. This information is provided without warranty.
Announcements
Quarter 1 Training Special
Schedule and attend any public training class now through March 31st and receive 10% off the normal class price OR opt for a gift certificate to the Apple Store for that same dollar amount. Restrictions apply and you must mention the promotional code Apple08 at the time of registration to receive the promotion.  Call 614-505-0770 for further details and restrictions.
Next Microsoft SQL Server User Group
The next Microsoft User Group is on Dec. 14, 2007.  Call 614-505-0770 or click here to reserve your seat.
Next Microsoft Data Management & BI Seminar
The next free Microsoft BI seminar is on Dec. 14, 2007.  Call 614-505-0770 or click here to reserve your seat.
Looking for a Few Good Men and Women
Join our award winning team of Business Intelligence consultants and .Net Software developers.
Send your resume and salary requirements to:
jobs@resultdata.com