簡體   English   中英

使用 java 讀取 apache 光束中的多個 csv 文件

[英]Read multiple csv file in apache beam using java

此代碼僅適用於一個文件作為輸入,但是當我通過時:-

  • D://beam//csv//*.csv
  • 或 D://beam//csv//20*.csv 作為它拋出的參數:-
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
    at beam.wordcount.TestCsv.main(TestCsv.java:60)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPath.parse(Unknown Source)
    at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
    at java.nio.file.Paths.get(Unknown Source)
    at org.apache.beam.sdk.io.LocalFileSystem.matchOne(LocalFileSystem.java:217)
    at org.apache.beam.sdk.io.LocalFileSystem.match(LocalFileSystem.java:90)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:119)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:140)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:152)
    at org.apache.beam.sdk.io.FileIO$MatchAll$MatchFn.process(FileIO.java:636)

我不知道為什么會拋出錯誤, * 用於讀取多個具有相似類型的文件

代碼

public interface BatchOptions extends PipelineOptions {    
        @Description("Path to the data file(s) containing game data.")          
        @Default.String("D:\\beam\\csv\\2020.csv")
        String getInput();
        void setInput(String value);   
    }


public static void main(String[] args) {
        BatchOptions options =             PipelineOptionsFactory.fromArgs(args).withValidation().as(BatchOptions.class);
        Pipeline pipeline = Pipeline.create(options);                       
        PCollection lines=pipeline
                .apply(FileIO.match().filepattern(options.getInput()))
                .apply(FileIO.readMatches());
          herepipeline.run().waitUntilFinish();    
    }

WindowsFileSystem不會擴展 * 並將其視為特殊字符。 我建議傳遞完整的目錄,例如D://beam//csv//

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM