简体   繁体   English

使用 java 读取 apache 光束中的多个 csv 文件

[英]Read multiple csv file in apache beam using java

This code works well with just one file as input but when I pass:-此代码仅适用于一个文件作为输入,但是当我通过时:-

  • D://beam//csv//*.csv D://beam//csv//*.csv
  • or D://beam//csv//20*.csv as parameter it throws :-或 D://beam//csv//20*.csv 作为它抛出的参数:-
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
    at beam.wordcount.TestCsv.main(TestCsv.java:60)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 17: D:\\beam\\csv\\20*.csv
    at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPath.parse(Unknown Source)
    at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
    at java.nio.file.Paths.get(Unknown Source)
    at org.apache.beam.sdk.io.LocalFileSystem.matchOne(LocalFileSystem.java:217)
    at org.apache.beam.sdk.io.LocalFileSystem.match(LocalFileSystem.java:90)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:119)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:140)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:152)
    at org.apache.beam.sdk.io.FileIO$MatchAll$MatchFn.process(FileIO.java:636)

I don't know why it is throwing error, * is used to read multiple files with similar type我不知道为什么会抛出错误, * 用于读取多个具有相似类型的文件

CODE代码

public interface BatchOptions extends PipelineOptions {    
        @Description("Path to the data file(s) containing game data.")          
        @Default.String("D:\\beam\\csv\\2020.csv")
        String getInput();
        void setInput(String value);   
    }


public static void main(String[] args) {
        BatchOptions options =             PipelineOptionsFactory.fromArgs(args).withValidation().as(BatchOptions.class);
        Pipeline pipeline = Pipeline.create(options);                       
        PCollection lines=pipeline
                .apply(FileIO.match().filepattern(options.getInput()))
                .apply(FileIO.readMatches());
          herepipeline.run().waitUntilFinish();    
    }

WindowsFileSystem does not expand * and treat it as special character. WindowsFileSystem不会扩展 * 并将其视为特殊字符。 I would recommend passing the complete directory like D://beam//csv//我建议传递完整的目录,例如D://beam//csv//

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Java中使用Apache Beam ParDo函数读取JSON文件 - How to read a JSON file using Apache beam parDo function in Java Java:使用 apache 光束管道读取存储在存储桶中的 excel 文件 - Java: read excel file stored in a bucket using apache beam pipeline 使用 apache 束谷歌数据流和 Z93F725A47423D21C83863 将具有未知 json 属性的大型 jsonl 文件转换为 csv - Transform a large jsonl file with unknown json properties into csv using apache beam google dataflow and java 使用 Apache Beam Java SDK 读取 Parquet 文件而不提供架构 - Read Parquet file using Apache Beam Java SDK without providing schema 如何从 apache 光束 java sdk 中的 minIO 读取文件 - How to read a file from minIO in apache beam java sdk Java Apache Beam - 使用DataflowRunner保存文件“LOCALY” - Java Apache Beam - save file “LOCALY” by using DataflowRunner Apache Beam管道从csv文件读取,拆分,groupbyKey并写入文本文件时出现“ IllegalStateException”错误。 为什么? - “IllegalStateException” error for Apache Beam pipeline to read from csv file, split, groupbyKey and write to text file. Why? 在Apache Beam中从GCS读取文件 - Read a file from GCS in Apache Beam 如何在 Apache Beam Java 中写入带有动态标头的 CSV 文件 - How do I write CSV file with dynamic headers in Apache Beam Java Java - Apache Beam:使用“UCS2-LE BOM”编码从 GCS 读取文件 - Java - Apache Beam: Read file from GCS with "UCS2-LE BOM" encoding
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM