简体   繁体   中英

Pentaho Kettle program in java to merge multiple csv files by columns

I have two csv files employee.csv and loan.csv.

In employee.csv I have four columns ie empid(Integer),name(String),age(Integer),education(String).

In loan.csv I have three columns ie loan(Double),balance(Double),empid(Integer).

Now, I want to merge these two csv files into a single csv file by empid column.So in the result.csv file the columns should be,

  • empid(Integer),
  • name(String),
  • age(Integer),
  • education(String),
  • loan(Double),
  • balance(Double).

Also I have to achieve this only by using kettle api program in Java. Can anyone please help me?

First of all, you need to create a kettle transformation as below:

  1. Take two "CSV Input Step", one for employee.csv and another for loan.csv
  2. Hop the input to the "Stream Lookup" step and lookup using the "emplid"
  3. Final step : Take a Text file output to generate a csv file output. 在此输入图像描述

I have placed the ktr code in here .

Secondly , if you want to execute this transformation using Java, i suggest you read this blog. I have explained how to execute a .ktr/.kjb file using Java.


Extra points:

If its required that the names of the csv files need to be passed as a parameter from the Java code, you can do that by adding the below code:

  trans.setParameterValue(parameterName, parameterValue);

where parameterName is the some variable name and parameterValue is the name of the file or the location.

I have already taken the files names as the parameter in the kettle code i have shared.

Hope it helps :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM