简体   繁体   中英

How to transform data from “Column,Row,Value” format to rows of the Values only in CSV format using Pentaho Kettle (Spoon)

I need to transform files that are in "Column,Row,Value" format CSV files (see "INPUT" below) to rows of the Values only - transformed into position as dictated by the "Column" and "Row" values (see "DESIRED OUTPUT below).

As you can see, every Row 0 Value should be a column header. I have created something that is close to what I need using the sequence:

"CSV file input" -> "Sort rows" (by Row, Column) -> "Row denormalizer" -> "Text file output"

However, in the "Row denormalizer", I am using the Column as the key. I need for the keys to be dynamic, and to be taken from the Values in the third column of the input for which the Row value is 0.

Perhaps this is not the best approach.

NOTE: The files will vary in length and number of columns.


INPUT (.csv file):

Column,Row,Value

0,0,Unique ID
0,1,84
0,2,f8
0,3,0d
0,4,ac
1,0,Property Code
1,1,cc040201
1,2,cc040202
1,3,cc040203
1,4,cc040204
2,0,Property Name
2,1,Stone Crest - 9635
2,2,Stone Crest - 9645
2,3,Stone Crest - 9655
2,4,Stone Crest - 9665
3,0,Address
3,1,9635 Granite Ridge
3,2,9645 Granite Ridge
3,3,9655 Granite Ridge
3,4,9665 Granite Ridge

DESIRED OUTPUT (.csv file):

"Unique ID","Property Code","Property Name","Address"
"84","cc040201","Stone Crest - 9635","9635 Granite Ridge"
"f8","cc040202","Stone Crest - 9645","9645 Granite Ridge"
"0d","cc040203","Stone Crest - 9655","9655 Granite Ridge"
"ac","cc040204","Stone Crest - 9665","9665 Granite Ridge"

Helpful input is greatly appreciated.

(As I understand your question, you know how to turn your input into a stream of rows holding the data listed in "DESIRED OUTPUT" - just that the respective column names are not what you want.)

You just have to disable the checkbox of "Header" on the "Content" tab of the "Text file output" step. The first line is then your new header.


If you wanted for some other reason to change the field names of the stream you would have to use meta data injection step . This solution would inevitably be quite messy and prone to error. At the end of the day ETL-processes should work with fixed and defined meta data to keep them robust. But your case is luckily easy to solve by thinking laterally.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM