Thursday, April 3, 2014

Separating duplicate and non-duplicate rows to separate tables

Step  1: Drag  the source to mapping and connect it to an aggregator transformation.
scenario 3 src to aggr
Step  2: In aggregator transformation, group by the key column and add a new port  call it count_rec to count  the key column.
Step  3: connect  a router to the  aggregator from the previous step.In router make two groups one named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.
scenario 3 aggr to router
The picture below depicting group name and the filter conditions
scenario router grouping
Step 4: Connect two group to corresponding target table.
Scenario 3 router to tgt

1 comment:

  1. Hi there to every one, since I am genuinely keen of
    reading this website's post to be updated daily. It carries
    nice data.
    kajal agarwal hot

    ReplyDelete

Data engineering Interview Questions

1)  What all challenges you have faced and how did you overcome from it? Ans:- Challenges Faced and Overcome As a hypothetical Spark develop...