简体   繁体   English

Java中的Pentaho Kettle程序按列合并多个csv文件

[英]Pentaho Kettle program in java to merge multiple csv files by columns

I have two csv files employee.csv and loan.csv. 我有两个csv文件employee.csv和loan.csv。

In employee.csv I have four columns ie empid(Integer),name(String),age(Integer),education(String). 在employee.csv中我有四列,即empid(整数),name(字符串),age(整数),education(String)。

In loan.csv I have three columns ie loan(Double),balance(Double),empid(Integer). 在loan.csv我有三列,即贷款(Double),余额(Double),empid(整数)。

Now, I want to merge these two csv files into a single csv file by empid column.So in the result.csv file the columns should be, 现在,我想通过empid column将这两个csv文件合并到一个csv文件中。所以在result.csv文件中,列应该是,

  • empid(Integer), EMPID(整数),
  • name(String), 名称(字符串)
  • age(Integer), 年龄(整数),
  • education(String), 教育(字符串)
  • loan(Double), 贷款(双人间)
  • balance(Double). 平衡(双人间)。

Also I have to achieve this only by using kettle api program in Java. 另外,我必须通过在Java中使用kettle api程序来实现这一点。 Can anyone please help me? 谁能帮帮我吗?

First of all, you need to create a kettle transformation as below: 首先 ,您需要创建一个水壶转换,如下所示:

  1. Take two "CSV Input Step", one for employee.csv and another for loan.csv 取两个“CSV输入步骤”,一个用于employee.csv,另一个用于loan.csv
  2. Hop the input to the "Stream Lookup" step and lookup using the "emplid" 将输入跳转到“Stream Lookup”步骤并使用“emplid”查找
  3. Final step : Take a Text file output to generate a csv file output. 最后一步:获取文本文件输出以生成csv文件输出。 在此输入图像描述

I have placed the ktr code in here . 我把ktr代码放在这里

Secondly , if you want to execute this transformation using Java, i suggest you read this blog. 其次 ,如果你想用Java执行这个转换,我建议你阅读这个博客。 I have explained how to execute a .ktr/.kjb file using Java. 我已经解释了如何使用Java执行.ktr / .kjb文件。


Extra points: 加分:

If its required that the names of the csv files need to be passed as a parameter from the Java code, you can do that by adding the below code: 如果需要将csv文件的名称作为参数从Java代码传递,则可以通过添加以下代码来实现:

  trans.setParameterValue(parameterName, parameterValue);

where parameterName is the some variable name and parameterValue is the name of the file or the location. 其中parameterName是一些变量名称, parameterValue是文件的名称或位置。

I have already taken the files names as the parameter in the kettle code i have shared. 我已经将文件名作为我共享的水壶代码中的参数。

Hope it helps :) 希望能帮助到你 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM