简体   繁体   中英

Is there any way to merge rows to fill null values in Talend Open Studio?

I have difficulty, working using Talend Open Studio.

My question is,

how can I fill the null values with NOTNULL values from the same columns with the same keys?

Suppose that I have source data like this.

EmployeeID | Part A Columns | Part B Columns | Part C Columns
EE1000001 | Part A Values | null | null
EE1000001 | null | Part B Values | null
EE1000001 | null | Part B Values | null
EE1000001 | null | null | Part C Values
EE1000001 | null | null | Part C Values
EE1000001 | null | null | Part C Values
EE1000002 | Part A Values | null | null
EE1000002 | null | Part B Values | null
EE1000002 | null | null | Part C Values


And I'd like to get result like following:

EmployeeID | Part A Columns | Part B Columns | Part C Columns
EE1000001 | Part A Values | Part B Values | Part C Values
EE1000001 | null | Part B Values | Part C Values
EE1000001 | null | null | Part C Values
EE1000002 | Part A Values | Part B Values | Part C Values

I've tried several ways to solve this, but I couldn't find one.

If you have an idea, please help me.

** Added

More intuitive example

So, each key might have multiple values for the same column,

and they should not be in the same row with commas like "C-1, C-2, C-3",

and they should be filled from the top of the first row with the same key.

This is the reason the first ID has three rows while the second one has only one row.

Use a tMap and a coalesce like function. In the tMap you can join the 2 dataset. (by default it is doing a left join which is perfect for you) then doing this:

A == null ? B : A

would get what you need.

I figured out one of the solutions by myself, and I'm gonna share it.

The keys for the solution are the component "tDenormalize" and another key value for each row.

Without another key column when you use only tDenormalize component, you would get the result of multiple values in a column of a row separated by the delimiter that you wrote, which I said shouldn't be in the same column with delimiters.

To get the exact same result that I wanted in the question, give rows additional key values.

I did something like this as pre-job:

row2.tmpKey = row1.Numeric.sequence(row1.EmployeeID + "PartA",1,1);

So, the raw data would be like:
EE_ID,ColumnA,ColumnB,ColumnC,TmpKey
EE001,Part A value,null,null,1
EE001,null,Part B value,null,1
EE001,null,Part B value,null,2
EE001,null,null,Part C value,1
...

Then you set "To denormalize columns: ColumnA, ColumnB, ColumnC" in Basic Settings of tDenormalize component view.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM