简体   繁体   中英

Read multiple csv file in apache beam using java

This code works well with just one file as input but when I pass:-

  • D://beam//csv//*.csv
  • or D://beam//csv//20*.csv as parameter it throws :-
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
    at beam.wordcount.TestCsv.main(TestCsv.java:60)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPath.parse(Unknown Source)
    at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
    at java.nio.file.Paths.get(Unknown Source)
    at org.apache.beam.sdk.io.LocalFileSystem.matchOne(LocalFileSystem.java:217)
    at org.apache.beam.sdk.io.LocalFileSystem.match(LocalFileSystem.java:90)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:119)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:140)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:152)
    at org.apache.beam.sdk.io.FileIO$MatchAll$MatchFn.process(FileIO.java:636)

I don't know why it is throwing error, * is used to read multiple files with similar type

CODE

public interface BatchOptions extends PipelineOptions {    
        @Description("Path to the data file(s) containing game data.")          
        @Default.String("D:\\beam\\csv\\2020.csv")
        String getInput();
        void setInput(String value);   
    }


public static void main(String[] args) {
        BatchOptions options =             PipelineOptionsFactory.fromArgs(args).withValidation().as(BatchOptions.class);
        Pipeline pipeline = Pipeline.create(options);                       
        PCollection lines=pipeline
                .apply(FileIO.match().filepattern(options.getInput()))
                .apply(FileIO.readMatches());
          herepipeline.run().waitUntilFinish();    
    }

WindowsFileSystem does not expand * and treat it as special character. I would recommend passing the complete directory like D://beam//csv//

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM