Performance Tuning: The Goal of the Performance tuning is to eliminating performance bottlenecks. First we should identify a performance bottleneck, eliminate it, and then identify The next performance bottleneck until session performance increases. We can use the test load option in sessions--> Properties. The most common performance bottleneck occurs when the Power Center Server Writes to a target database. You can identify performance bottlenecks by the following methods: 1. Running test sessions: You can configure a test session to read from a flat file source Or to write to a flat file target to identify source and target bottlenecks. 2. Studying performance details: You can create a set of information called performance details to identify session bottlenecks. Performance details provide information such as buffer input and Output efficiency. 3. Monitoring system performance: You can use system monitoring tools to view percent U usage, I/O waits, and paging to identify system bottlenecks. Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by following these guidelines: 1. Eliminate source and target database bottlenecks: Have the database optimize database Performance by optimizing the query, increasing the database network packet size, or configuring index and key constraints. 2. Eliminate mapping bottlenecks: Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks. 3. Eliminate session bottlenecks: You can optimize the session strategy and use performance details to help tune session configuration. 4. Eliminate system bottlenecks: Have the system analyze information from system monitoring tools and improve U and network performance. The first step in performance tuning is to identify the performance bottleneck. Performance bottlenecks can occur in the source and target databases, the mapping, the session, and the system. Generally, you should look for performance bottlenecks in the following order: Target Source Mapping Session System You can identify performance bottlenecks by running test sessions, viewing performance details, and using system monitoring tools. 1. Identifying Target Bottlenecks: The most common performance bottleneck occurs when the Power Center Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. If your session already writes to a flat file target, you probably do not have a target bottleneck. You can optimize session performance by writing to a flat file target local to the Power Center Server.
Causes for a target bottleneck may include small check point intervals, small database network packet size, or problems during heavy loading operations. The following task can Increase the Performance in target bottleneck: Drop or Disable indexes and constraints. Use bulk loading. Use external loading. Increase database network packet size. Increase commit intervals Optimize Oracle target databases. 2. Identifying Source Bottlenecks: This Performance bottlenecks can occur when the Power Center Server reads from a source database. If your session reads from a flat file source, you probably do not have a source bottleneck. You can improve session performance by setting the number of bytes the Power Center Server reads per line if you read from a flat file source. If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks. A. Using Filter Transformation: Add a filter transformation in the mapping after each source qualifier. Set the filter condition to false so that no data is processed past the filter transformation. If the time it takes to run the new session remains about the same, then you have a source bottleneck. B. Using Read Test Mapping: You can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping. Use the following steps to create a read test mapping: Make a copy of the original mapping. In the copied mapping, keep only the sources, source qualifiers, and any custom s or queries. Remove all transformations. Connect the source qualifiers to a file target. C. Using Database Query: You can identify source bottlenecks by executing the read query directly against the source database. Copy the read query directly from the session log. Execute the query against the source database with a query tool. Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time measurements, you can use an optimizer hint to eliminate the source bottleneck. Causes for a source bottleneck may include an inefficient query or small database network packet sizes. The following task can Increase the Performance in source bottleneck: Use conditional filters. Use Indexes wherever possible Increase database network packet size. Optimize the query. Create tempdb as in-memory database. Connect to Oracle databases using IPC protocol.
3. Identifying Mapping Bottlenecks: If you determine that you do not have a source or target bottleneck, you might have a mapping bottleneck. You can identify mapping bottlenecks by using a Filter transformation in the mapping. If you determine that you do not have a source bottleneck, you can add a Filter transformation in the mapping before each target definition. Set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, you have a mapping bottleneck. You can also identify mapping bottlenecks by using performance details. High errorrows and rowsinlookupcache counters indicate a mapping bottleneck. The following task can Increase the Performance in mapping bottleneck: Mapping Level optimization may take time to implement but can significantly boost session performance. Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations. For transformations that use data cache (such as Aggregator, er, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. You can also perform the following tasks to optimize the mapping: Configure single- reading. Optimize data type conversions. Eliminate transformation errors. Optimize transformations. Optimize expressions. 4. Identifying Session Bottlenecks: If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The Power Center Server creates performance details when you enable Collect Performance Data in the Performance settings on the Properties tab of the session properties. Performance details display information about each Source Qualifier, target definition, and individual transformation. All transformations have some basic counters that indicate the number of input rows, output rows, and error rows. You can also perform the following tasks to Increase the session performance: Small cache size. Run Concurrent Sessions Partition the Session Remove staging Area Tune off session recovery Reduce the Error Tracing Low buffer memory. Small commit intervals.
5 Identifying System Bottlenecks: After you tune the source, target, mapping, and session, you may consider tuning the system. You can identify system bottlenecks by using system tools to monitor U usage, memory usage, and paging. On Windows, you can use system tools in the Task Manager or istrative Tools. On UNIX systems you can use system tools such as vmstat and iostat to monitor system performance. How to improve Session Performance: The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following. 1. Run Concurrent Sessions: Run Concurrent Sessions by using batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance. 2. Partition the Session (Power Center): It improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines. 3. Tune off Session Recovery 4. Reduce Error Tracing 5. Small Cache Size 6. Low buffer memory 7. Small commit Intervals 8. Tune Parameter: DTM buffer pool, Buffer Block size, Index cache size, Data cache size, Commit Interval, Tracing Level. If the allocated data or index cache is not large enough to store the data, the server stores the data in a temporary disk file as it processes the session data. Each time the server pages to the disk the performance slows. This can be seen from the counters. Since generally data cache is larger then the Index cache, it has to be more than the Index. 9. Staging areas: If u uses staging areas u force informatica server to perform multiple data es. Removing of staging areas may improve session performance. 10. Run the informatica server in ASCII mode improves the session performance. Because ASCII mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character. 11. Aggregator, Rank and er transformation may often decrease the session performance. Because they must group data before processing it. To improve session performance in this case enable sorted ports option 12. Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server. 13. Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance. 14. If a session s multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. 15. We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger,
choose server configure database connections. If u are target consists key constraints and indexes u slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session. Aggregator Performance: 1. Enable the Sorted Input option: Use sorted input to decrease the use of aggregate caches. It reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to sorted data to the Aggregator transformation. 2. Incremental Aggregator: It is a process of calculating the summaries for new records by using agg cache. This improves the performance of session. 3. Limit the number of connected input/output ports: Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. 4. Group by on Simpler Columns: Group by simpler columns. Preferably Numeric columns. 5. Filter before aggregating: If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation. 6. Increase the Data and Index Cache Size: You can increase session performance by increasing the index and data cache sizes in the transformation properties. Filter Performance: 1. Use the Filter transformation early in the mapping: Use the Filter transformation early in the mapping to reduce unnecessary Rows. 2. Use Source qualifier filter reduces the number of rows used throughout the mapping: The Source Qualifier transformation provides an alternate way to filter rows. Rather than filtering rows from within a mapping, the Source Qualifier transformation filters rows when read from a source. The main difference is that the source qualifier limits the row set extracted from a source, while the Filter transformation limits the row set sent to a target. Since it runs in the database, you must make sure that the filter condition in the Source Qualifier transformation only uses standard SQL. Source qualifier reduces the number of rows used throughout the mapping, it provides better performance. er Performance: 1. Enable Sorted Input: It improves the Performance of data . The port on which condition is defined, the same port needs to be sorted. When you enable the Sorted Input Option in er, the Power Center Server improves session performance by minimizing disk input and output. You see the greatest performance improvement when you work with large data sets. 2. Partition the Pipeline: You can increase the number of partitions in a pipeline to improve session performance. When you partition a session using a er transformation that requires sorted input, you must the er transformation receives sorted data. However, partitions that redistribute rows can rearrange the order of sorted data, so it is important to configure partitions to maintain sorted data.
3. Select type as normal or master outer performs faster than a full outer or detail outer . 4. Perform s in a database when possible: Performing a in a database is faster than performing a in the session. In some cases, this is not possible, such as ing tables from two different databases or flat file systems. If you want to perform a in a database, you can use the following options: A Create a pre-session stored procedure to the tables in a database. B. Use the Source Qualifier transformation to perform the . 5. For an unsorted er transformation, designate as the master source the source with fewer Rows: For optimal performance and disk storage, designate the master source as the source with the fewer rows. During a session, the er transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the comparison occur, which speeds the process. 6. For a sorted er transformation, designate as the master source the source with fewer duplicate key values: For optimal performance and disk storage, designate the master source as the source with fewer duplicate key values. When the Power Center Server processes a sorted er transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Power Center Server must cache more rows, and performance can be slowed. 7. Run the Power Center Server in ASCII mode: When you run the Power Center Server in Unicode mode, it uses the selected session sort order to sort character data. When you run the Power Center Server in ASCII mode, it sorts all character data using a binary sort order. To ensure that data is sorted as the Power Center Server requires, the database sort order must be the same as the -defined session sort order. If you unsorted or incorrectly sorted data to a er transformation configured to use sorted data, the session fails and the Power Center Server logs the error in the session log file. Lookup Performance: 1. Enable the Lookup cache: Improve session performance by increasing lookup size. 2. Remove unwanted columns: Remove unwanted columns from the Lookup table. 3. Lookup on small tables: Improve session performance by lookup on small tables 4. Enable Sorted Input: By enabling the sorted input we can improve the performance for Flat File lookups. 5. Adding Index to the lookup table: We can create an index for the lookup table if we have permissions. We can improve the performance for both cached and uncached lookups. This is important for very large lookup tables. Since the Power Center Server needs to query, sort and compare values in these columns, the index needs to include every column used in a lookup condition. A. Cached lookups: You can improve performance by indexing the columns in the lookup ORDER BY. The session log contains the ORDER BY statement. B. Uncached lookups: Because the Power Center Server issues a SELECT statement for each row ing into the Lookup transformation, you can improve performance by indexing the columns in the lookup condition.
6. Create Unconnected Lookup: It does not involve directly in the data flow due to this It can cal many number of times in the mapping, when we want to return only one port. It s only relational sources and Static Cache. 7. Use a persistent lookup cache for static lookups: This type of cache is used among multiple sessions. If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Power Center Server then saves and reuses cache files from session to session, eliminating the time required to read the lookup source. 8. Use lookup override: It is a default SQL statement. If both sources are relational, then use lookup override is use to multiple sources. By default informatica server add the order by clause. We can override the SQL lookup in the following circumstances to increase performance: (1) Override the ORDER BY statement: Override the ORDER BY statement with fewer columns to Increase Performance. When you override the ORDER BY statement, you must suppress the generated ORDER BY statement with a comment notation with two dashes like ’_ _’ Ex: SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE FROM ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE -(2) Add a WHERE statement: Use a lookup SQL override to add a WHERE statement to the default SQL statement. You might want to use this to reduce the number of rows included in the cache. Note: The session fails if you include large object ports in a WHERE clause. 9. Use (‘=’) Operator for several conditions: If a Lookup transformation specifies several conditions, you can improve lookup performance by placing all the conditions that use the equality operator (‘=’) first in the list of conditions that appear under the Condition tab. 10. Divide the lookup mapping into two pipelines: A. Dedicate one for insert: source - target, these r new rows. Only the new rows will come to mapping and the process will be fast. B. Dedicate second for update: source=target, these r existing rows. Only the rows which exists all ready will come into the mapping. 11. Cache files on the same Machine: Cache files should be on the same Machine where informatica server is installed so that it reduces the time. 12. Use shared cache or Reuse the lookup: If the same lookup SQL is being used in some other lookup, then you have to go for shared cache or Reuse the lookup. 13. Connect Native Database Driver: The Power Center Sever can connect to a lookup table using a native database driver or an ODBC driver. Native database drivers improve session performance. 14. Tables in the database instead of Lookup: If the lookup table is on the same database as the source table in your mapping and caching is not feasible, the tables in the source database rather than using a Lookup transformation. Sequence Generator Performance:
1. Create Reusable Sequence Generator: Try Creating a Reusable Sequence Generator Trans and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target. 2. The number of cached value property determines the number of values the informatica server caches at one time. For nonreusable Sequence Generator transformations, Number of Cached Values is set to zero by default, For reusable Sequence Generator transformations, Number of Cached Values is set to 1000 by default. Expression Performance: 1. Use Common Logic 2. Minimize aggregate function calls 3. Replace common sub-expressions with local variables. 4. Use operators instead of functions What are the main parameters to increase the Informatica server performance: Before doing tuning that is specific to Informatica: 1. Check hard disks on related machines. (Slow disk access on source and target databases, source and target file systems, as well as the Informatica Server and repository machines can slow session performance.) 2. Improve network speed. (Slow network connections can slow session performance.) 3. Check Us on related machines (make sure the Informatica Server and related machines run on high performance Us.) 4. Configure physical memory for the Informatica Server to minimize disk I/O. (Configure the physical memory for the Informatica Server machine to minimize paging to disk.) 5. Optimize database configuration 6. Staging areas: If you use a staging area, you force the Informatica Server to perform multiple es on your data. Where possible, remove staging areas to improve performance. 7. You can run multiple Informatica Servers on separate systems against the same repository. Distributing the session load to separate Informatica Server systems increases performance. Informatica specific: - Transformation tuning - Using Caches - Avoiding Lookups by using DECODE for smaller and frequently used tables - Applying Filter at the earliest point in the data flow etc. How the informatica server increases the session performance: For relational sources, informatica server creates multiple connections for each partition of a single source and extracts separate range of data for each connection. Informatica server reads multiple partitions of a single source concurrently. Similarly for loading also informatica
server creates multiple connections to the target and loads partitions of data concurrently. For XML and file sources, informatica server reads multiple files concurrently. For loading the data informatica server creates a separate file for each partition (of a source file). You can choose to merge the targets. There are 10 lookups in a mapping and one there is performance is slow, so how can we find out which one: Check which lookup having maximum number of records and the time taken to built the cache from session log file or if you have any lookup override try to execute at backend and see how much time it is taken. Also you can start running the map and keep refreshing the session log file and when it processing lookup it will be waiting for the lookup to create. I have 20 lookup, 10 ers, 1 normalizer how you will improve the session performance: We have to calculate lookup & er caches size. Cache Formula: For Aggregator Transformation: Index Cache: no. of groups [( ‡” column size ) + 17 ] Data Cache : no. of groups [( ‡” column size ) + 7 ] For Lookup Transformation: Index Cache: no. of rows in lookup table [( ‡” column size ) + 16 ] Data Cache : no. of rows in lookup table [( ‡” column size ) + 8 ] For er Transformation: Index Cache: no. of master rows[( ‡” column size ) + 16 ] Data Cache : no. of master rows[( ‡” column size ) + 8 ] NOTE: The Symbol is Sigma. It means the sum of all the column size. Calculation for Data and Index caches are given in Informatica manual in detail: Regarding changing your cache size if you have cache size less than what is required, excess data will be paged to hard disc and will slow the session and hence reduces the performance. And if your cache size bigger than required it will take more space in your main memory and in case, memory is not available, it’ll fail the session. If we have 4 lookups, how u increase the performance: 1. First we decide cache is essential or not depends upon source and lookup rows. 2. By using Lookup override we can improve the session performance by ing multiple lookups.