DELTALAKE AZURE CLOUD DATABRICS -PYSPARK SNOWFLAKE

Monday, April 7, 2014

admin console in Configuration Properties do uncheck OutputmetadataforFF property

Q:-I defined target file in Informatica with column names and populating data into CSV file,
that is working fine, but Informatica is generating # in front of first column name.
Is there a simple resolution to make Informatica stop doing this?
Sol:-
Go to appropriate integration service on admin console in Configuration
Properties do uncheck OutputmetadataforFF property.That is set the value of
this property is No.This will resolve your problem.

Test load In Informatica

Test load In Informatica :-

This option is to ensure that the mapping is running fine without any errors or exception.

Test load is very useful property while working in production support.If your sessions fails for
any duplicate issues /Null issues.You can do test load by enabling test load option.By doing so it
will reject the records(Null/Duplicate) while loading into the target.(Here its not ACTUAL load just
like a test run).After completion of the session you can see the rejected counts. From that you can
indentify how many records are bad records.
Pre request:
1.Test load option should be enabled.

2. Number of rows to be tested.(U can mention what ever u need.Better go for larger values eg: 100000)

3. Set stop on errors to 0.(Session wont get abend while rejecting the bad records)

Last but not least,After doing test load DISABLE the test load option and REVERT stop on errors.

Let me know if you have any question...

we remind 2 thing before test load:-
1)It will work only normal mode.
2)Truncate option is not checked

In simple words, 'test load' is a load without a 'commit' to the targets! The source qualifier fetches data and all
the rows are processed through all transformations. Even the target insert queries are generated. But, rows are not
inserted.
In the session log u will not find the entry of a commit issued. Also, in the session load summary,
the entry will be 'Target test load summary' instead of 'Target load summary'.
The advantage of this is already evident in the above explanation. Especially when there is no access to
debug mode, this option is very helpful and can be used with a verbose data to check for errors and issues.

Thursday, April 3, 2014

Remove special characters from empno

Senario:Suppose in flat_file some special symbols like @,%,$,#,& has added in empno column along with the actual data. How to remove those special charcters ?
empno in source
empno(in string format)
7@3%$,21
432#@1
324&*,$2
In target
empno
7321
4321
3242
Following are the steps for achieving this mapping

Connect 0/p columns of SQF to an expression transformation.
In expression make empno as input and create another port empno1 as output port with date datatype. And in empno1 write condition like this. and finally send it to target

1)Replace all special characters into NULL

REPLACESTR(1,Account_PK,';','?','!','#','$','%','^','*','_','=','"','`','|','~','>','<','{}','[]','...',NULL)

REPLACECHR(1,REPLACECHR(1,Account_Name,CHR(13),''),CHR(10),'')

Extracting every nth row

Scenario: How to load every nth row from a Flat file/ relational DB to the target? Suppose n=3, then in above condition the row numbered 3,6,9,12,....so on, This example takes every 3 row to target table.

Solution:

Connect an expression transformation after source qualifier.
Add the next value port of sequence generator to expression transformation.
In expression create a new port (validate) and write the expression like in the picture below.
Connect a filter transformation to expression and write the condition in property like in the picture below.
Finally connect to target.

Separating duplicate and non-duplicate rows to separate tables

Step 1: Drag the source to mapping and connect it to an aggregator transformation.

Step 2: In aggregator transformation, group by the key column and add a new port call it count_rec to count the key column.

Step 3: connect a router to the aggregator from the previous step.In router make two groups one named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.

The picture below depicting group name and the filter conditions

Step 4: Connect two group to corresponding target table.

Target table rows , with each row as sum of all previous rows from source table.

Scenario: How to produce rows in target table with every row as sum of all previous rows in source table ? See the source and target table to understand the scenario.

SOURCE TABLE
id	Sal
1	200
2	300
3	500
4	560

TARGET TABLE
Id	Sal
1	200
2	500
3	1000
4	1560

Pull the source to mapping and then connect it to expression.
In expression add one column and make it output(sal1) and sal port as input only.
We will make use of a function named cume() to solve our problem, rather using any complex mapping. Write the expression in sal1 as cume(sal) and send the output rows to target.

Validating all mapping in repository

Scenario:How validate all mapping in repository ?

Solution:

In repository go to menu “tool” then “queries”. Query Browser dialog box will appear.Then click on new button.
In Query Editor, choose folder name and object type as I have shown in the picture.
After that, execute it (by clicking the blue arrow button).
Query results window will appear. You select single mapping (by selecting single one) or whole mapping (by pressing Ctrl + A) and go to "tools" then "validate" option to validate it.