Performance Tuning the Siebel Change Capture Process in DAC
DAC performs the change capture process for Siebel source systems. This process has two components:
1)The change capture process occurs before any task in an ETL process runs.
2)The change capture sync process occurs after all of the tasks in an ETL process have completed successfully.
Supporting Source Tables:-
The source tables that support the change capture process are as follows:
S_ETL_I_IMG. Used to store the primary key (along with MODIFICATION_NUM and LAST_UPD) of the records that were either created or modified since the time of the last ETL.
S_ETL_R_IMG. Used to store the primary key (along with MODIFICATION_NUM and LAST_UPD) of the records that were loaded into the data warehouse for the prune time period.
S_ETL_D_IMG. Used to store the primary key records that are deleted on the source transactional system.
Full and Incremental Change Capture Processes:-
The full change capture process (for a first full load) does the following:
Inserts records into the S_ETL_R_IMG table, which has been created or modified for the prune time period.
Creates a view on the base table. For example, a view V_CONTACT would be created for the base table S_CONTACT.
The incremental change capture process (for subsequent incremental loads) does the following:
Queries for all records that have changed in the transactional tables since the last ETL date, filters them against the records from the R_IMG table, and inserts them into the S_ETL_I_IMG table.
Queries for all records that have been deleted from the S_ETL_D_IMG table and inserts them into the S_ETL_I_IMG table.
Removes the duplicates in the S_ETL_I_IMG table. This is essential for all the databases where "dirty reads" (queries returning uncommitted data from all transactions) are allowed.
Creates a view that joins the base table with the corresponding S_ETL_I_IMG table.
Performance Tips for Siebel Sources:-
Performance Tip: Reduce Prune Time Period:-
Reducing the prune time period (in the Connectivity Parameters subtab of the Execution Plans tab) can improve performance, because with a lower prune time period, the S_ETL_R_IMG table will contain a fewer
number of rows. The default prune time period is 2 days. You can reduce it to a minimum of 1 day.
Note: If your organization has mobile users, when setting the prune time period, you must consider the lag time that may exist between the timestamp of the transactional system and the mobile users' local timestamp. You
should interview your business users to determine the potential lag time, and then set the prune time period accordingly.
Performance Tip: Eliminate S_ETL_R_IMG From the Change Capture Process
If your Siebel implementation does not have any mobile users (which can cause inaccuracies in the values of the "LAST_UPD" attribute), you can simplify the change capture process by doing the following:
Removing the S_ETL_R_IMG table.
Using the LAST_REFRESH_DATE rather than PRUNED_LAST_REFRESH_DATE.
To override the default DAC behavior, add the following SQL to the customsql.xml file before the last line in the file, which reads as
. The customsql.xml file is located in the dac\CustomSQLs directory.DAC performs the change capture process for Siebel source systems. This process has two components:
1)The change capture process occurs before any task in an ETL process runs.
2)The change capture sync process occurs after all of the tasks in an ETL process have completed successfully.
Supporting Source Tables:-
The source tables that support the change capture process are as follows:
S_ETL_I_IMG. Used to store the primary key (along with MODIFICATION_NUM and LAST_UPD) of the records that were either created or modified since the time of the last ETL.
S_ETL_R_IMG. Used to store the primary key (along with MODIFICATION_NUM and LAST_UPD) of the records that were loaded into the data warehouse for the prune time period.
S_ETL_D_IMG. Used to store the primary key records that are deleted on the source transactional system.
Full and Incremental Change Capture Processes:-
The full change capture process (for a first full load) does the following:
Inserts records into the S_ETL_R_IMG table, which has been created or modified for the prune time period.
Creates a view on the base table. For example, a view V_CONTACT would be created for the base table S_CONTACT.
The incremental change capture process (for subsequent incremental loads) does the following:
Queries for all records that have changed in the transactional tables since the last ETL date, filters them against the records from the R_IMG table, and inserts them into the S_ETL_I_IMG table.
Queries for all records that have been deleted from the S_ETL_D_IMG table and inserts them into the S_ETL_I_IMG table.
Removes the duplicates in the S_ETL_I_IMG table. This is essential for all the databases where "dirty reads" (queries returning uncommitted data from all transactions) are allowed.
Creates a view that joins the base table with the corresponding S_ETL_I_IMG table.
Performance Tips for Siebel Sources:-
Performance Tip: Reduce Prune Time Period:-
Reducing the prune time period (in the Connectivity Parameters subtab of the Execution Plans tab) can improve performance, because with a lower prune time period, the S_ETL_R_IMG table will contain a fewer
number of rows. The default prune time period is 2 days. You can reduce it to a minimum of 1 day.
Note: If your organization has mobile users, when setting the prune time period, you must consider the lag time that may exist between the timestamp of the transactional system and the mobile users' local timestamp. You
should interview your business users to determine the potential lag time, and then set the prune time period accordingly.
Performance Tip: Eliminate S_ETL_R_IMG From the Change Capture Process
If your Siebel implementation does not have any mobile users (which can cause inaccuracies in the values of the "LAST_UPD" attribute), you can simplify the change capture process by doing the following:
Removing the S_ETL_R_IMG table.
Using the LAST_REFRESH_DATE rather than PRUNED_LAST_REFRESH_DATE.
To override the default DAC behavior, add the following SQL to the customsql.xml file before the last line in the file, which reads as
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_D_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,'I'
,LAST_UPD
FROM
%SRC_TABLE
WHERE
%SRC_TABLE.LAST_UPD > %LAST_REFRESH_TIME
%FILTER
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,'D'
,LAST_UPD
FROM
S_ETL_D_IMG_%SUFFIX
WHERE NOT EXISTS
(
SELECT
'X'
FROM
S_ETL_I_IMG_%SUFFIX
WHERE
S_ETL_I_IMG_%SUFFIX.ROW_ID = S_ETL_D_IMG_%SUFFIX.ROW_ID
)
;
DELETE
FROM S_ETL_D_IMG_%SUFFIX
WHERE
EXISTS
(SELECT
'X'
FROM
S_ETL_I_IMG_%SUFFIX
WHERE
S_ETL_D_IMG_%SUFFIX .ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
AND S_ETL_I_IMG_%SUFFIX.OPERATION = 'D'
)
;
Performance Tip: Omit the Process to Eliminate Duplicate Records :-
When the Siebel change capture process runs on live transactional systems, it can run into deadlock issues when DAC queries for the records that changed since the last ETL process. To alleviate this problem, you need to
enable "dirty reads" on the machine where the ETL is run. If the transactional system is on a database that requires "dirty reads" for change capture, such as MSSQL, DB2, or DB2-390, it is possible that the record
identifiers columns (ROW_WID) inserted in the S_ETL_I_IMG table may have duplicates. Before starting the ETL process, DAC eliminates such duplicate records so that only the record with the smallest
MODIFICATION_NUM is kept. The SQL used by DAC is as follows:
SELECT
ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
S_ETL_I_IMG_%SUFFIX A
WHERE EXISTS
(
SELECT B.ROW_ID, COUNT(*) FROM S_ETL_I_IMG_%SUFFIX B
WHERE B.ROW_ID = A.ROW_ID
AND B.OPERATION = 'I'
AND A.OPERATION = 'I'
GROUP BY
B.ROW_ID
HAVING COUNT(*) > 1
)
AND A.OPERATION = 'I'
ORDER BY 1,2
However, for situations where deadlocks and "dirty reads" are not an issue, you can omit the process that detects the duplicate records by using the following SQL block. Copy the SQL block into the customsql.xml file
before the last line in the file, which reads as
. The customsql.xml file is located in the dac\CustomSQLs directory.
SELECT
ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
S_ETL_I_IMG_%SUFFIX A
WHERE 1=2
Performance Tip: Omit the Process to Eliminate Duplicate Records
When the Siebel change capture process runs on live transactional systems, it can run into deadlock issues when DAC queries for the records that changed since the last ETL process. To alleviate this problem, you need to
enable "dirty reads" on the machine where the ETL is run. If the transactional system is on a database that requires "dirty reads" for change capture, such as MSSQL, DB2, or DB2-390, it is possible that the record
identifiers columns (ROW_WID) inserted in the S_ETL_I_IMG table may have duplicates. Before starting the ETL process, DAC eliminates such duplicate records so that only the record with the smallest
MODIFICATION_NUM is kept. The SQL used by DAC is as follows:
SELECT
ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
S_ETL_I_IMG_%SUFFIX A
WHERE EXISTS
(
SELECT B.ROW_ID, COUNT(*) FROM S_ETL_I_IMG_%SUFFIX B
WHERE B.ROW_ID = A.ROW_ID
AND B.OPERATION = 'I'
AND A.OPERATION = 'I'
GROUP BY
B.ROW_ID
HAVING COUNT(*) > 1
)
AND A.OPERATION = 'I'
ORDER BY 1,2
However, for situations where deadlocks and "dirty reads" are not an issue, you can omit the process that detects the duplicate records by using the following SQL block. Copy the SQL block into the customsql.xml file
before the last line in the file, which reads as
. The customsql.xml file is located in the dac\CustomSQLs directory.
SELECT
ROW_ID, LAST_UPD, MODIFICATION_NUM
FROM
S_ETL_I_IMG_%SUFFIX A
WHERE 1=2
Performance Tip: Manage Change Capture Views
DAC drops and creates the incremental views for every ETL process. This is done because DAC anticipates that the transactional system may add new columns on tables to track new attributes in the data warehouse. If you
do not anticipate such changes in the production environment, you can set the DAC system property "Drop and Create Change Capture Views Always" to "false" so that DAC will not drop and create incremental views. On
DB2 and DB2-390 databases, dropping and creating views can cause deadlock issues on the system catalog tables. Therefore, if your transactional database type is DB2 or DB2-390, you may want to consider setting the
DAC system property "Drop and Create Change Capture Views Always" to "false." For other database types, this action may not enhance performance.
Note: If new columns are added to the transactional system and the ETL process is modified to extract data from those columns, and if views are not dropped and created, you will not see the new column definitions in the
view, and the ETL process will fail.
Performance Tip: Determine Whether Informatica Filters on Additional Attributes
DAC populates the S_ETL_I_IMG tables by querying only for data that changed since the last ETL process. This may cause all of the records that were created or updated since the last refresh time to be extracted.
However, the extract processes in Informatica may be filtering on additional attributes. Therefore, for long-running change capture tasks, you should inspect the Informatica mapping to see if it has additional WHERE
clauses not present in the DAC change capture process. You can modify the DAC change capture process by adding a filter clause for a
is located in the dac\CustomSQLs directory.
SQL for Change Capture and Change Capture Sync Processes
The SQL blocks used for the change capture and change capture sync processes are as follows:
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_R_IMG_%SUFFIX
;
TRUNCATE TABLE S_ETL_D_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_R_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,LAST_UPD
FROM
%SRC_TABLE
WHERE
LAST_UPD > %PRUNED_ETL_START_TIME
%FILTER
;
TRUNCATE TABLE S_ETL_I_IMG_%SUFFIX
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,'I'
,LAST_UPD
FROM
%SRC_TABLE
WHERE
%SRC_TABLE.LAST_UPD > %PRUNED_LAST_REFRESH_TIME
%FILTER
AND NOT EXISTS
(
SELECT
ROW_ID
,MODIFICATION_NUM
,'I'
,LAST_UPD
FROM
S_ETL_R_IMG_%SUFFIX
WHERE
S_ETL_R_IMG_%SUFFIX.ROW_ID = %SRC_TABLE.ROW_ID
AND S_ETL_R_IMG_%SUFFIX.MODIFICATION_NUM = %
SRC_TABLE.MODIFICATION_NUM
AND S_ETL_R_IMG_%SUFFIX.LAST_UPD = %
SRC_TABLE.LAST_UPD
)
;
INSERT %APPEND INTO S_ETL_I_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, OPERATION, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,'D'
,LAST_UPD
FROM
S_ETL_D_IMG_%SUFFIX
WHERE NOT EXISTS
(
SELECT
'X'
FROM
S_ETL_I_IMG_%SUFFIX
WHERE
S_ETL_I_IMG_%SUFFIX.ROW_ID = S_ETL_D_IMG_%
SUFFIX.ROW_ID
)
;
DELETE
FROM S_ETL_D_IMG_%SUFFIX
WHERE
EXISTS
(
SELECT
'X'
FROM
S_ETL_I_IMG_%SUFFIX
WHERE
S_ETL_D_IMG_%SUFFIX.ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
AND S_ETL_I_IMG_%SUFFIX.OPERATION = 'D'
)
;
DELETE
FROM S_ETL_I_IMG_%SUFFIX
WHERE LAST_UPD < %PRUNED_ETL_START_TIME
;
DELETE
FROM S_ETL_I_IMG_%SUFFIX
WHERE LAST_UPD > %ETL_START_TIME
;
DELETE
FROM S_ETL_R_IMG_%SUFFIX
WHERE
EXISTS
(
SELECT
'X'
FROM
S_ETL_I_IMG_%SUFFIX
WHERE
S_ETL_R_IMG_%SUFFIX.ROW_ID = S_ETL_I_IMG_%SUFFIX.ROW_ID
)
;
INSERT %APPEND INTO S_ETL_R_IMG_%SUFFIX
(ROW_ID, MODIFICATION_NUM, LAST_UPD)
SELECT
ROW_ID
,MODIFICATION_NUM
,LAST_UPD
FROM
S_ETL_I_IMG_%SUFFIX
;
DELETE FROM S_ETL_R_IMG_%SUFFIX WHERE LAST_UPD < %PRUNED_ETL_START_TIME
;