|
Return to Newsletter Contents...
Indexes
by: Charles Tournear; Senior Consultant, MCT, MCSE, MCSD,
MCDBA, CRCP
An index is a balanced tree structure that aids in the retrieval of data from
data pages in a table. Understanding
how indexes function in SQL Server can help in deciding which indexes to create
and to make sure they are properly maintained.
The purpose of creating indexes is to improve the performance of Queries.
Many people see transactional systems with a lot of Inserts, Updates, and
Deletes as not needing any indexes, but in order to update a record or delete a
record it has to be found first, which requires a query.
So indexes can improve the performance of both transaction systems and
analytical systems.
Indexes are composed of a key which is one or more columns used to create the
balanced tree in searching for a record.
SQL Server 2005 allows for indexes to include a key used to create the
balanced tree as well as other columns used for faster retrieval.
How? By having indexes cover
a query. An index covers a query
when all columns in the query can be retrieved by one or more indexes.
By including additional columns in the index other than what is being searched
for, this improves the chance of an index covering a query.
Indexes generally are smaller than the original table they are based on
and can thus be searched faster.
Even without a WHERE clause, an index that covers a query might be used to speed
up the retrieval of the data, by having fewer pages to scan.
The basic statement for creating an index is:
CREATE INDEX
indexname ON table(column1, column2, …)
With SQL Server 2005 you can use the INCLUDE key word to add additional columns
other than the key.
CREATE INDEX
indexname ON table(column1, column2, …) INCLUDE (column3, column4,…)
There are two types of indexes in SQL Server:
Clustered and Non-clustered.
Cluster indexes physically sort the data pages in the table itself.
Because of this there can only be one clustered index.
Non-clustered indexes create new table structures that contain pointers
to the rows in the original table.
Clustered indexes are the default index type when creating Primary Keys using
the graphical tools. Non-Clustered
indexes are the default for the Create Index statement unless otherwise
specified. For example CREATE
CLUSTERED INDEX indexname ON table(column).
The query optimizer in SQL Server uses statistics that are associated with an
index to determine whether an index will be useful in processing the query.
Statistics are the distribution of data that matches the key.
For example: for an index on last name, the statistics would hold how
many names begin with A, how many with B and so on.
If 90 percent of the data were under R in the distribution table and the
data you were searching for began with R then the index would not be very
useful, because there would be a large number of pages to search through, though
for any other letter it would be considered useful, because there would be few
pages to search through.
When a compound index (multiple columns) is created statistics are created for
each column independently and for all combinations of the columns.
If the statistics are out of date, then it’s possible that an index will be
ignored when it could be useful, or be used when it would actually slow the
query down. Keeping statistics up
to date therefore is very important.
Statistics are automatically recreated when an index is created or rebuilt.
Many people, when designing a database, created indexes on the tables before any
data had been added. They assumed
because Auto Create Statistics and Auto Update Statistics were turned on under
the Database Options properties that everything was set to maintain statistics.

The Auto Create Statistics and the Auto Update Statistics only update when more
than 10% of the data has changed from the last time update occurred.
If indexes are created on tables before data is added then update
statistics will never run. The
original amount of data to begin with was 0 since there is no last date that
update statistics occurred. Adding
10000 rows of new data, then makes it 0 divided by 10000 or 0% change so update
statistics won’t run. And since
there is no last date the original amount of 0 will continue to be used.
So if you create indexes on tables before adding data you need to manually run
an update statistics statement to initialize the process.
For analytical systems you may want to rebuild indexes or update
statistics depending on the size of the tables after each import of data.
In the example below the database consistency checker function DBCC
SHOW_STATISTICS is used to get information about the statistics for a specific
index. Note the updated column
showing the last time the statistics were updated for this index.
The second statement is an example how the statistics for all indexes on
the Person.address table could be rebuilt.

An additional item that affects whether an index will be used is the query
itself. You should avoid using items
that will prevent the optimizer from being able to select an index.
Avoid changes to datatypes.
For example: WHERE convert(varchar(20),
birthdate, 101) = @datevariable.
Formulas around a column will prevent an index being used on that column;
consider changing the type of the @datevariable to a datetime data type instead
of converting the column. Any
formula using a column other than aggregate functions will prevent indexes being
selected on that column. For example,
the common use of LTRIM(RTRIM(lastname)) will prevent an index on last name from
being used. If you simply cannot
avoid using formulas on a column and that formula is used a lot, consider
creating a calculated column and creating the index on the calculated column.
In summary, indexes can be very useful for finding records through queries as
long as they are properly maintained.
Be sure to periodically rebuild the indexes and update the statistics so
that the query optimizer can make an efficient decision about how to run your
queries.
Go to Top |
Return to Newsletter Contents
|