The Result Data Newsletter   
Volume 709 - September 2007   
© Copyright 2007 Result Data Consulting, Ltd.  614-505-0770  www.resultdata.com   

    Result Data Home Page  |  Newsletter Archive  |  Upcoming Events  |  Classes & Workshops  |  Request Information
Upcoming Events:  Mid Ohio BusinessObjects User Group: 11/14, Data to Dashboards:  11/14

Return to Newsletter Contents...

 

Data Integrator Optimization Tuning Techniques Introduction

by: Tom Hinkle, Consultant, CRCP, Data Integrator Instructor

Source and Target Based Tuning techniques

 

Source-based performance options

Using array fetch size

Data Integrator provides an easy way to request and obtain multiple data rows from source databases. The array fetch size feature allows you to retrieve data using fewer requests, thereby significantly reducing traffic on your network. Tuning array fetch size can also reduce the amount of CPU use on the Job Server computer. 

Array Fetch Size indicates the number of rows returned in a single fetch call to a source table. The default value is 1000 and the maximum value is 5000. This value reduces the number of round-trips to the database and can improve performance for table reads.

Caching data

You can improve the performance of data transformations that occur in memory by caching as much data as possible. By caching data, you limit the number of times the system must access the database. Cached data must fit into available memory. Select Yes in a source table or file editor to specify that data from the source is cached using memory on the Job Server computer.

Caching Tables

Business Objects recommends that you apply table caching for small tables (typically   with a table size of less than 1% of total physical memory).  You can specify the number of rows in a table to override the default value Data Integrator uses during execution. To specify a table size, set the Estimated row count attribute for the table. The default row count value is 50,000.

 

Caching joins

Cache a source only if it is being used as an inner join. (Inner joins have a lower join rank than outer joins). Caching does not affect the order in which tables are joined. If optimization conditions are such that Data Integrator is pushing down operations to the underlaying database, it ignores your cache setting. If a table becomes too large to fit in the cache, disable caching for the table.

 

Caching lookups

You can also improve performance by caching data when looking up individual values from tables and files.

 

There are two methods of looking up data:

 

Using a Lookup function in a query

Data Integrator has three Lookup functions: lookup, lookup_seq, and lookup_ext. The lookup and lookup_ext functions have cache options. Caching lookup tables improves performance because Data Integrator avoids the expensive task of creating a database query on each row.

 

Using a source table and setting it as the outer join

Although you can use lookup functions inside Data Integrator queries, an alternative is to expose the translate (lookup) table as a source table in the data flow diagram, and use an outer join (if necessary) in the query to look up the required data.

 

Caching table comparisons

You can improve the performance of a Table_Comparison transform by caching the comparison table.

There are three modes of comparisons:

Row-by-row select

Cached comparison table

Sorted input

Of the three, Row-by-row select will likely be the slowest and Sorted input the fastest.

 

Join ordering

Join ordering is determined by the type of join you choose to use in Data Integrator. There are two types of joins:  

Normal — Created by drawing connection lines from multiple sources in a data flow to the same query and then entering an integer as a Join rank value for each source using source editors. Data Integrator orders join operations using these values.

 

Outer Join — Created by taking a normal join and then using the Outer Join tab of the query to specify the outer and inner sources. Data Integrator processes the outer source as if it was assigned the highest join rank.

 

For a join between two tables, assign a higher join rank value to the larger  table and, if possible, cache the smaller table. For a join between a table and file:

o    If the file is small and can be cached, then assign it a lower join rank value and cache it.

o    If you cannot cache the file, then assign it a higher join rank value so that it becomes the “outer table” in the join.

For a join between two files, assign a higher rank value to the larger file and, if possible, cache the smaller file.

 

Minimizing extracted data

The best way to minimize the amount of data extracted from the source systems is to retrieve only the data that has changed since the last time you performed the extraction. This technique is called changed-data capture, and it is described in detail in the Data Integrator Designer Guide, Chapter 19: Techniques for Capturing Changed Data .

Target-based performance options

 

Loading method and rows per commit

You can choose to use regular loading or bulk loading with Data Integrator.

To improve performance, you can select the following options from the target table editor. Note that if you use one, you cannot use the others for the same target.

Enable Partitioning

Parallel loading option. The number of parallel loads is determined by the number of partitions in the target table.

Number of Loaders

Parallel loading option. The number of parallel loads is determined by the number you enter for this option.

Transactional loading

This option allows you to commit data to multiple tables as part of the same transaction.

 

Rows per commit for regular loading defaults to 1000 rows. Setting the Rows per commit value significantly affects job performance. Adjust the rows per commit value in the target table editor’s Options tab, noting the following rules:

Do not use negative number signs and other non-numeric characters.

If you enter nothing or 0, the text box will automatically display 1000.

If you enter a number larger than 5000, the text box automatically displays 5000, which is the maximum value allowed.

 

Staging tables to speed up auto-correct loads

For large loads where auto-correct is required:

• For Oracle targets, create two data flows:

• In the first data flow, extract data and load it to a staging table using a temporary table object that resides in the same datastore as your target table.

• In the second data flow, use the staging table as the source and an Oracle database table as the target. In the target table editor, select Auto-correct load. This entire statement will be pushed down to the database, maximizing performance.

 

• For other targets, load data into a staging table and use post-load SQL, a script, or a data flow to:

• Delete rows from the target where the key is in the staging table.

• Insert rows from the staging table into the target.

 

Additional information can be found in the Data Integrator Performance Optimization Guide

 

Go to Top  |  Return to Newsletter Contents

The Result Data Newsletter is published approximately once a month to share the latest information on business intelligence, data management and CRM. There should be a link below to allow you to change or remove yourself from our list. We take your requests very seriously. If you have any difficulty please contact us at 614-505-0770 and we will make sure that your request is handled properly. This is not intended to be an unsolicited message and you can reach us in person if needed.

© Copyright 2007 Result Data Consulting, Ltd. - All Rights Reserved
All trademarks and copyrights are the property of their respective owners. This information is provided without warranty.
Announcements
Fall Training Special
Get a jump on your Christmas shopping!  Schedule and attend any public training class now through December and receive a FREE $50 gift card to Target, Barnes & Nobles or Best Buy (restrictions apply).  Call 614-505-0770 for further details and restrictions.
BI Designed and Priced for Mid-Sized Companies
Join us on Sept. 26 for a special presentation just for mid-sized businesses and find out how companies like your are improving their performance using business intelligence (BI).  Call 614-505-0770 or click here to reserve your seat.
Next MOBOUG Meeting
The next Mid-Ohio Business Objects User Group (MOBOUG) meeting is November 14, 2007.  Call 614-505-0770 or click here to reserve your seat.
Data to Dashboards
Join us on November 14th to learn about the methods and tools that can transform your data into highly functional and robust dashboards, reports and analytics.  Data-to-Dashboards will provide an overview as well as live demonstrations of the complete process that collects, transforms and delivers your data using industry leading Business Intelligence software.  Enjoy a comprehensive review of the concepts and tools that deliver top quality BI content to your user community.  Call 614-505-0770 or click here to reserve your seat.
Looking for a Few Good Men and Women
Join our award winning team of Business Intelligence consultants and .Net Software developers.
Send your resume and salary requirements to:
jobs@resultdata.com