简体   繁体   中英

How to remove minus and plus sign duplicates via Talend job?

I have loaded local file into talend process and need to do below condition this file data

Below my csv file data showing like

NO,DATE,MARK
123,2015-03-01,200    
123,2015-03-01,-200    
123,2015-03-01,200    
123,2015-03-01,200 
125,2016-01-01,80

Here above "200" and "-200" two values availed. if I have -200 I need to remove corresponding +200 value after that If I have same NO,DATE,MARK then I need to remove duplicates two

" 123,2015-03-01,200"," 123,2015-03-01,200" = " 123,2015-03-01,200"

Finally my result should come like below

 NO,DATE,MARK
 123,2015-03-01,200
 125,2016-01-01,80

After that I need to some 200 + 80 = 125,2016-01-01,280 . How to do above process using talend job.

Step by step, we can start by removing this:

123,2015-03-01,200    
123,2015-03-01,-200

we can do it by summing MARK after grouping by NO and DATE by using the talend compoenet tAggregateRow . After, we will get :

123,2015-03-01,0

Now we can use the component tFilterRow to remove all rows having MARK == 0 , and the component tUniqRow to remove duplicated rows.

The last step is to get the sum of MARK using tAggregateRow and store it in a context variable, then get the greatest NO and the latest DATE by using the component tSortRow and then get only that row using tSampleRow . We can affect the sum of MARK .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM