Thursday, April 24, 2014

Set Target Load Order in Informatica


Set Target Load Order in Informatica


When we have multiple interdependent data streams (pipelines) in our mapping, we can configure the Load Order of the Targets. This is often done to maintain referential integrity, maintaining consistent data or in a case where the target of first pipeline is source of the second pipeline in the same mapping.

In our example, we have two pipelines, one each for Student and Employee. Target Load Plan can be set at the mapping level in Designer as follows

1.    Open Mapping
 

1.    Go to mappings - Target Load Plan


Adjust the target group load order as per your requirements using Up, Down arrows

4. Click on OK and you are done!

The target load plan shows Source Qualifiers and respective Targets

Tuesday, April 15, 2014

SOFT-DELETES & HARD-DELETE:

SOFT-DELETES:- DELETE  THE RECORDS FROM THE TARGET TABLE (MEANING, MARKING A RECORD AS 'DELETED' INSTEAD
OF PHYSICALLY DELETING THEM).

HARD-DELETE:-DELETE THE RECORD PERMANENTLY FROM DATABASE.


ETL Perspective on Deletion:-
When we have ‘Soft DELETE’ implemented at the source side, it becomes very easy to track the invalid
transactions and we can tag those transactions in DW accordingly. We just need to filter the
records from source using that STATUS field and issue an UPDATE in DW for the corresponding records.
Few things to be kept in mind in this case.

If only ACTIVE records are supposed to be used in ETL processing, we need to add specific filters
while fetching source data.

Sometimes INACTIVE records are pulled into the DW and moved till the ETL Data Warehouse level.
While pushing the data into Exploration Data Warehouse, only the ACTIVE records are sent for reporting purpose.

For ‘Hard DELETE’, if Audit Table is maintained at source systems for what are transactions were deleted,
we can source the same, i.e. join the Audit table and the Source table based on NK and
logically delete them in DW too. But it becomes quite cumbersome and costly when no account is kept of what
was deleted at all. In these cases, we need to use different ways to track them and update the
corresponding records in DW.

Thursday, April 10, 2014

Oracle 11g introduced the concept of ‘Virtual Column’ within a table.

Oracle 11g introduced the concept of ‘Virtual Column’ within a table.
Virtual Columns are similar to normal table’s columns but with the following differences:

They are defined by an expression. The result of evaluation of this expression becomes the value of the column.
The values of the virtual column are not stored in the database. Rather, it’s computed at run-time
when you query the data.
You can’t update (in SET clause of update statement) the values of virtual column. These are read
only values, that are computed dynamically and any attempt to modify them will result into oracle error.
The syntax for defining a virtual column is:

column_name [datatype] [GENERATED ALWAYS] AS [expression] [VIRTUAL]

Points :-
Virtual columns can be used in the WHERE clause of UPDATE and DELETE statement but they can’t be modified by DML.
Statistics can be collected on them.
They can be used as a partition key in virtual column based partitioning.
Indexes can be created on them. As you might have guessed, oracle would create function based
indexes as we create on normal tables.
Constraints can be created on them.

CREATE TABLE EMPLOYEE
(
    empl_id        NUMBER,
    empl_nm        VARCHAR2(50),
    monthly_sal    NUMBER(10,2),
    bonus          NUMBER(10,2),
    total_sal      NUMBER(10,2) GENERATED ALWAYS AS (monthly_sal*12 + bonus)
);

Here we have defined a virtual column “total_sal” whose value would be dynamically calculated;

SELECT column_name, data_type, data_length, data_default, virtual_column
  FROM user_tab_cols
 WHERE table_name = 'EMPLOYEE';

COLUMN_NAME | DATA_TYPE | DATA_LENGTH | DATA_DEFAULT             | VIRTUAL_COLUMN
EMPL_ID     | NUMBER    | 22          | null                     | NO          
EMPL_NM     | VARCHAR2  | 50          | null                     | NO          
MONTHLY_SAL | NUMBER    | 22          | null                     | NO          
BONUS       | NUMBER    | 22          | null                     | NO          
TOTAL_SAL   | NUMBER    | 22          | "MONTHLY_SAL"*12+"BONUS" | YES  

LIMITATIONS ON VIRTUAL COLUMNS:-

A virtual column can only be of scalar datatype or XMLDATATYE. It can’t be a user defined type, LOB or RAW.
All columns mentioned as part of the virtual column expression should belong to the same table.
No DMLs are allowed on the virtual columns.
The virtual column expression can’t reference any other virtual column.
Virtual columns can only be created on ordinary tables. They can’t be created on index-organized,
external, object, cluster or temporary tables.
If a deterministic function is used as virtual column expression, that virtual column can’t be used as
a partitioning key for virtual column-based partitioning.

Oracle Skip Locked in 11g

Oracle Skip Locked

Oracle 11g introduced SKIP LOCKED clause to query the records from the table which are not locked
in any other active session of the database.

The select for update statement has always been problematic for large updates because it the default
is to wait for locks and using select for update other tasks can abort waiting on access with the ORA-300036 error:

ORA-30006: resource busy; acquire with WAIT timeout expired

In other cases using select for update with the nowait clause you your own update may abort with the ORA-00054 error:

ORA-00054 resource busy and NOWAIT specified

Even worse, if a select for update task aborts, a zombie process may hold the row locks long term,
requiring DBA intervention.

To illustrate, we open two sessions. In the first session, we lock the row with Empno as 10 using FOR UPDATE NOWAIT.

SELECT * FROM Employee  WHERE Empno = 10  FOR UPDATE NOWAIT;

In the second session, we try to lock two rows (Empno10 and 20) from the table emp using FOR UPDATE NOWAIT. An exception is thrown after executing the following statement because one of the row (i.e. Empno10) out of the selected list is already locked by session 1.

SELECT * FROM Employee WHERE Empno IN (10,20) FOR UPDATE NOWAIT;

we get an error ORA-00054: resource busy and acquire with NOWAIT specified

Now we again try to lock two rows (deptno(s) 10 and 20) from the table dept but using the clause FOR UPDATE SKIP LOCKED instead of FOR UPDATE NOWAIT. As you can see the following statement has:

returned the control without throwing an exception  acquired lock on the row (i.e. deptno 20)
which is available for locking
skipped the row (i.e. deptno 10) that has been locked already by session 1
SELECT * FROM Employee WHERE Empno IN (10,20) FOR UPDATE SKIP LOCKED;

Wednesday, April 9, 2014

Invisible Indexes in Oracle Database 11g

Invisible Indexes in Oracle Database 11g

Oracle 11g allows indexes to be marked as invisible. Invisible indexes are maintained like any other index,
but they are ignored by the optimizer unless the OPTIMIZER_USE_INVISIBLE_INDEXES parameter is set to TRUE
at the instance or session level. Indexes can be created as invisible by using the INVISIBLE keyword,
and their visibility can be toggled using the ALTER INDEX command.

    CREATE INDEX index_name ON table_name(column_name) INVISIBLE;

    ALTER INDEX index_name INVISIBLE;
    ALTER INDEX index_name VISIBLE;

Making the index visible means it is still available to the optimizer when the OPTIMIZER_USE_INVISIBLE_INDEXES
 

Data engineering Interview Questions

1)  What all challenges you have faced and how did you overcome from it? Ans:- Challenges Faced and Overcome As a hypothetical Spark develop...