简体   繁体   中英

Pentaho Data Integration transformation, loading fields from csv file (Java API)

I am trying to create simple transformation using Kettle Java API.

Just two blocks, one to read from csv file and the other to write it into text file.

Code:

PluginRegistry.addPluginType(SpoonPluginType.getInstance());
        PluginRegistry.addPluginType(StepPluginType.getInstance());
        PluginRegistry.init();

        TransMeta transMeta = new TransMeta();
        transMeta.setName("testTrans");


        String csvStep = "read from file ";
        CsvInputMeta csvInputMeta = new CsvInputMeta();
        csvInputMeta.setDefault();
        csvInputMeta.setFilename(INPUT_FILE);
        csvInputMeta.setDelimiter(";");


        String csvId = PluginRegistry.getInstance().getPluginId(csvInputMeta);
        StepMeta stepMeta = new StepMeta(csvId, csvStep, csvInputMeta);
        transMeta.addStep(stepMeta);


        TextFileOutputMeta textFileOutputMeta = new TextFileOutputMeta();
        textFileOutputMeta.setDefault();
        textFileOutputMeta.setFilename(OUTPUT_FILE);
        textFileOutputMeta.setFileFormat("txt");

        String outPutStep = "Output step";
        String outputId = PluginRegistry.getInstance().getPluginId(textFileOutputMeta);
        StepMeta stepMeta2 = new StepMeta(outputId, outPutStep, textFileOutputMeta);
        transMeta.addStep(stepMeta2);

        transMeta.addTransHop(new TransHopMeta(stepMeta, stepMeta2));
        transMeta.setName("testTrans");

        String xml = transMeta.getXML();
        DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File(trans.xml)));
        dos.write(xml.getBytes("UTF-8"));
        dos.close();

        Trans trans = new Trans(transMeta);
        trans.execute(null);
        trans.waitUntilFinished();

When I run above code the output is:

INFO  18-09 17:32:08,700 - read from file  - Line number : 50000
INFO  18-09 17:32:08,703 - Output step - linenr 50000
INFO  18-09 17:32:09,147 - read from file  - Line number : 100000
INFO  18-09 17:32:09,149 - Output step - linenr 100000
INFO  18-09 17:32:09,491 - read from file  - Line number : 150000
INFO  18-09 17:32:09,492 - Output step - linenr 150000
INFO  18-09 17:32:09,786 - read from file  - Line number : 200000
INFO  18-09 17:32:09,788 - Output step - linenr 200000

and so on. But my csv file actually contains 4 rows thats look like that:

id;val
1;10
2;15
3;20

The problem is transformation "doesn't know" what the fields are. When I exported transformation into xml file, loaded it into Pentaho Spoon and pressed "Get fields" button everything worked correctly (only 3 rows was read).

I know I can just manually create these fields and set them into csvInputMeta but is there a way to do this automatically just like button "Get fields" in Spoon does?

If anyone is curious, I found a solution.

You have to use your own csv reader...

But you can get some help in class CsvInputDialog (its GUI class). There are methods like getCsv and getInfo , those are private so you can't use them directly but you can use them to write your own method. Then as @Dirk said use setInputFields method.

Or you can find some ready csv parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM