简体   繁体   English

Apache Beam管道从csv文件读取,拆分,groupbyKey并写入文本文件时出现“ IllegalStateException”错误。 为什么?

[英]“IllegalStateException” error for Apache Beam pipeline to read from csv file, split, groupbyKey and write to text file. Why?

My input data looks like: 我的输入数据如下:

id,vin,url,exteriorColor,interiorColor,design,transmission,lastcrawled,mileage,price,certified,dealerId,historyType,MSRP
114722309,19XVC2F35PR012846,http://www.pohankaacura.com/auto/used-2017-acura-ilx-chantilly-va-near-buckeystown-md/24742881/,Modern Steel,graystone,0,8-Speed Dual-Clutch,2018-02-05 01:49:47 UTC,1646,22550,0,28453

I want to build a Beam pipeline that will read this data from a csv file, grab the vin and count the number of times the vin occurs in the file. 我想建立一个Beam管道,该管道将从csv文件中读取此数据,获取vin并计算vin在文件中出现的次数。 So I want to group by vin and calculate the count. 所以我想按vin分组并计算计数。 I want my final output to be in a flat file. 我希望最终输出在一个平面文件中。 I had missed the annotation so I've added it now, but I get a different error and I can't find a solution here either. 我错过了注释,所以现在添加了注释,但是出现了另一个错误,也无法在此处找到解决方案。 Below is my code. 下面是我的代码。

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.*;
import org.apache.beam.sdk.values.KV;

public class p1 {
    public static void main(String[] args) {
        PipelineOptions options = PipelineOptionsFactory.create();

        Pipeline p = Pipeline.create(options);
        p.apply(TextIO.read().from("~/slow_storage_drive/beam_test_files/one_vin.csv"))

                .apply("Parse&ConvertToKV", MapElements.via(
                        new SimpleFunction<String, KV<String, Integer>>() {
                            public KV<String, Integer> apply(String input){
                                String[] split = input.split(",");
                                String key = split[1];
                                Integer value = 1;
                                return KV.of(key, value);
                            }
                        }
                ))

                .apply(GroupByKey.<String, Integer>create())


                .apply("SumOfValuesByKey", ParDo.of(new DoFn<KV<String, Iterable<Integer>>, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext context) {
                        Integer crawlCount = 0;
                        String vin = context.element().getKey();
                        Iterable<Integer> counts = context.element().getValue();
                        for (Integer count : counts){
                            crawlCount += count;
                        }
                        context.output(vin + ": " + crawlCount);
                    }
                }))

                .apply(TextIO.write().to("~/slow_storage_drive/beam_example_files/emr_beam_test/final_output").withoutSharding());

        p.run().waitUntilFinish();
    }

}

I try to run the program using the following command : 我尝试使用以下命令运行程序:

mvn compile -X exec:java -Dexec.mainClass=p1 -Pdirect-runner

I am getting the following error: 我收到以下错误:

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project emr_beam_test: An exception occured while executing the Java class. java.lang.IllegalStateException: Invisible parameter type of p1$2 arg0 for public p1$2$DoFnInvoker(p1$2) -> [Help 1]

I am unable to understand what I am doing wrong. 我无法理解我在做什么错。 Can anyone please help me? 谁能帮帮我吗?

You have to annotate your anonymous class method processElement with the @ProcessElement annotation. 您必须使用@ProcessElement批注对匿名类方法processElement进行批注。

For more information on the annotation please refer to https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html 有关注释的更多信息,请参阅https://beam.apache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html

It seems I was getting the invisible parameter type exception because Apache Beam does not support Java 10 yet. 似乎我收到了不可见的参数类型异常,因为Apache Beam还不支持Java 10。 I changed my JAVA_HOME to point at Java 8 instead and the program worked. 我将JAVA_HOME更改为指向Java 8,该程序正常工作。 I got the idea from this thread: Apache Beam: Invisible parameter type exception 我从这个线程得到了这个主意: Apache Beam:不可见的参数类型异常

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java:使用 apache 光束管道读取存储在存储桶中的 excel 文件 - Java: read excel file stored in a bucket using apache beam pipeline 在Apache Beam中从GCS读取文件 - Read a file from GCS in Apache Beam 使用 java 读取 apache 光束中的多个 csv 文件 - Read multiple csv file in apache beam using java 如何从 apache 光束 java sdk 中的 minIO 读取文件 - How to read a file from minIO in apache beam java sdk 无法读取.csv文件。 从JOptionPane切换到控制台I / O后出现FileNotFoundException错误 - Failed to read a .csv file. Getting FileNotFoundException error after switching from JOptionPane to console I/O 如何在 Apache Beam Java 中写入带有动态标头的 CSV 文件 - How do I write CSV file with dynamic headers in Apache Beam Java Apache Beam 未将无界数据保存到文本文件 - Apache Beam Not Saving Unbounded Data To Text File Java:Apache Tika:从.doc文件提取文本时出现意外的运行时异常。 该文件打开,MSWord中没有任何错误 - Java: Apache Tika: unexpected runtimeexception when extracting text from .doc file. The file opens without any error in MSWord Apache 光束分裂到多条管道 Output - Apache Beam Split to Multiple Pipeline Output 从光束管道写入 tfrecords? - Write tfrecords from beam pipeline?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM