简体   繁体   中英

Processing a single group of rows at once in Talend Open Studio for Data Integration

I have a data source where each row has five fields:

company name; year; code; value;

In my target output row model I want to produce a row like so

company name;year;value1;value2;value3;value4

Where value1,value..N are not concatenation for a single code but rather a "mapping". Ie code 50 => "Total Revenues"

So I need to perform the following logic:

  1. First select all the records for the same company name / year
  2. Then apply some custom java logic that performs the mapping between my codes and my fields of the output row.

This is a in-memory map reduce with about 1M rows. How should this be handled in Talend Open Studio for Data Integration?

select all the records for the same company name / year

You might want to use tAggregate ( https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/18.1+tAggregateRow ) to group the flow by company name and year

apply some custom java logic that performs the mapping between my codes and my fields of the output row.

Talend has a component called tMap that allows you to map input fields into output fields.

In your tMap you can use something like:

(assuming that input is the name of the flow into your tMap and output is the name of your flow out of your tMap )

In output.field1 put input.code == 50? input.value : 0 input.code == 50? input.value : 0

In output.field2 put input.code == 60? input.value : 0 input.code == 60? input.value : 0

In output.field2 put input.code == 70? input.value : 0 input.code == 70? input.value : 0

etc

This is assuming you are ok with leaving the field columns with 0 if the value was for another code .

If you want the value for each code to be in a different output row out of the tMap you can use a logic similar to the above, only putting each test ( code == 70? input.value : 0 ) in a different output table, and then filtering out the rows that have 0 (using a tFilter ) after the tMap .

To add output tables you can use the + symbol on the top right of the tMap .

See here for more detials on how to use tMap : https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide54EN/tMap

I hope this helps!

You could do it like that, and it's essentially the approach Maira Bay already suggested:

  1. Set up your data source to emit those lines one at a time. I used tFixedFlowInput for that. You'd probably have to read from a file.
  2. Optionally sort by company name and year with a tSortRow .
  3. Map with a tMap the value of each line to the corresponding column in the result line with a guard clause like input.code.equals("code for this column") ? input.value : null input.code.equals("code for this column") ? input.value : null .
  4. Aggregate the rows with a tAggregateRow , grouping by company name and year, selecting the first value for each of the value rows - but make sure to ignore the nulls.
  5. Do anything you want with the resulting lines.

I tried that with some sample data, hence the tFixedFlowInput in step 1, and it worked for me on my machine in TOS 6.3.1.

Beware: the solution proposed assumes you only got one value per combination of company name, year and code.

See solution below which I believe will fulfill your precise requirement of taking a delimited file data source and transforming it into a denormalized out as specified above.

First I mocked up a file with the same format as you specified. I made the values a logical concatenation of Company, Year, and sequence. This makes it easy to verify the output.

在此处输入图片说明

Next I use that as an input, run it thru a sorter, then denormalize on the value field. Finally you can see the output in a tLogRow .

在此处输入图片说明

I also included the component view of tDenormalize so you can see how that is done. You can use this technique in any falvor of Talend Open Studio .
在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM