|
Return to Newsletter Contents...
Excel Table Analysis
by: Michael Mullin, BI Consultant
One aspect of Business Intelligence (BI) is data mining.
You can think of data mining as finding
information in your data that is not readily apparent.
For instance, a business might have an order processing system running on
SQL Server. Such a system would have a
set of tables containing all the data it has on its customers and another set of
tables containing all the data about orders. The system is designed to handle
orders. If you were to request
information for John Smith’s order placed on May 1st, this system
could provide it. That is not data
mining. The system is simply retrieving
data it was designed to create, store and retrieve.
However, can the order processing system tell you which age group is most likely
to by product x during the month of July? Probably
not. The system has created and
maintained the records which contain the data, but the information is not what
the system was created to retrieve. The information is hidden within this data.
Retrieving this information is called data mining.
You are not creating new data; you are
simply manipulating data that already exists. Personally,
I think the process would be better named “Information Mining”, but I didn’t get
to vote on the name.
Microsoft has been busily creating and enhancing products to support BI
activities, so it is no surprise to learn that Excel has received new tools to
perform BI analysis. Microsoft has
released a set of office add-ins called “Microsoft SQL Server 2005 Data Mining
Add-Ins for Office 2007” which is a set of three separate Office add-ins to be
used with Office 2007. Sorry, no backward
compatibility here, you have to have the latest version.
Two of these add-ins are for Excel.
The add-in I will write about this month is the Table Analysis Tools for Excel.
As the name implies, these tools analyze
tables. But these are Excel tables that
are being analyzed, not SQL Server tables. SQL
Server 2005 is used by the add-in to perform analysis services, but the user
does not need to directly interface with SQL Server.
The user needs to connect to Analysis
Services of SQL Server 2005, but the connection is created from within Excel.
A table in Excel is a range of rows and columns which has been formatted as a
table. To format a range of data as table
in Excel 2007, select the range and click on ‘Format as Table’ in the style
section of the ribbon menu. Select a
style for the table and follow the prompts.
After you have installed the add-in and created a table, you click inside the
table to activate the Table Tools menu option. When
you select the Analyze tab, the standard ribbon menu is replaced with a ribbon
that contains Table Analysis options. Once
you move the focus outside the table the standard ribbon menu is displayed and
the Table Tools option is hidden.
Table Analysis consists of six tools Analyze Key, Detect Categories, Fill From
Example, Forecast, Highlight Exceptions and Scenario Analysis.
Whenever the user selects a tool, a
simply wizard is presented to step the user through the tools use.
Analyze Key Influences
This tool will examine a set of records and determine which values in a set of
fields influences a key value in the record.
For example, a table might contain customer data such as age, income,
city, state, zip, marital status, number of children and the amount spent on
purchases of entertainment products from your company in the last 12 months.
The Analyze Key Influences tool will
analyze this table and generate reports, which are embedded into the worksheet
indicating which fields best predicted entrainment purchases.
Detect Categories
The same table can be analyzed for categories. Detect
Categories will examine the rows and try to group them into categories of
similar customers. It will produce
embedded reports, listing the categories and which values in the fields
correspond to that category. It will also
append a category column to the table and label each row with the appropriate
category. It will also produce a stacked
bar chart showing the distribution of column values between the categories.
Although the tool names the categories as
Category1, Category2 … you can easily assign meaningful names.
Fill From Example
This tool automatically fills in missing data in a specific column for all rows
in the selected table, based on sample values provided by you for some of the
rows. It also creates a report which
describes the rules that were used to fill in the missing data.
One might wonder why this is considered a data mining tool, since it is not
extracting derived information but rather inserting data.
However, the tool uses SQL Server Data
Mining to analyze the table and determine which value is the best value to
insert into a row.
Forecast
Forecast is a prediction tool. It
performs time series analysis on the data in the table to project future values.
Like the Detect Categories tool, Forecast
will add a new column to the table which contains the predicted values.
It will also generate a chart which shows
the existing values and the projected values.
Highlight Exceptions
This tool analyzes the table’s rows that do not appear to match the general
patterns found in the rest of the table. These
unusual values could be due to data entry errors or they may be values pointing
to exceptional customers. The tool not
only highlights the row, but also indicates the field with the unusual value.
Analysis of these exceptions can lead to
elimination of bad data, or discovery of a special category.
It may also reveal data which might be
excluded from statistical analysis in order to avoid skewing the data.
Scenario Analysis
There are two types of scenarios that this tool will provide: ‘Goal Seek’ and
‘What If.’
Table Analysis provides a number of very useful tools.
However, once the tools have been used,
the spreadsheet can be viewed by other users who have not installed the add-in.
You can even publish the worksheet to
Excel Services on SharePoint and share it with everyone.
Go to Top |
Return to Newsletter Contents
|