简体   繁体   English

Apache Beam 中的多个 output PCollections 故障发射元件

[英]Trouble emitting elements to multiple output PCollections in Apache Beam

I got troubles with using MultiOutputReciver to PCollectionTuple with TupleTag in Apache Beam.我在 Apache Beam 中使用 MultiOutputReciver 到带有 TupleTag 的 PCollectionTuple 时遇到了麻烦。

logSchema is my AvroGenerated class for handling incoming logs. logSchema 是我的 AvroGenerated class 用于处理传入日志。 date, type, message etc. What I want to do is store the different types of logs (errors, warnings, notices) in differenct PCollections.日期、类型、消息等。我想要做的是将不同类型的日志(错误、警告、通知)存储在不同的 PCollections 中。

I get this error java: incompatible types: logSchema cannot be converted to capture#1 of?我收到此错误java: incompatible types: logSchema cannot be converted to capture#1 of?

for every out.get(tags.get(0)).output(log);对于每个out.get(tags.get(0)).output(log); inside processElement inside branching extends DoFn<logSchema, logSchema>内部processElement内部branching extends DoFn<logSchema, logSchema>

Basically Required type: capture of? Provided: logSchema基本Required type: capture of? Provided: logSchema Required type: capture of? Provided: logSchema

I've mostly followed the Beam Programming Guide covering additional outputs but also what other examples I could find here.我主要遵循Beam 编程指南,涵盖额外的输出以及我可以在此处找到的其他示例。

Anyone care to explain what Im getting wrong?有人愿意解释我做错了什么吗? Im feeling lost but also close.我感到迷失,但也很接近。

Edit: Seems like I forgot the.withOutputTags() on the branching ParDo of branching.编辑:好像我忘记了分支 ParDo 上的 .withOutputTags() 。 Added this to the code below, still getting the same incompatible types error.将此添加到下面的代码中,仍然得到相同的不兼容类型错误。 IDEA red lines it (did before aswell) and wants to cast to (PCollectionTuple), why is this necessary? IDEA 红线它(之前也做过)并想要转换为(PCollectionTuple),为什么这是必要的?

Here is my code这是我的代码

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionTuple;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;
import java.text.ParseException;

public class Pipe extends Thread {

    public Pipe() {
    }

    static class conformToSchema extends DoFn<String, logSchema> {
        @ProcessElement
        public void processElement(@Element String element, OutputReceiver<logSchema> receiver ) throws ParseException {
            logSchema log = new logSchema(element);
            receiver.output(log);
        }
    }

    static class branching extends DoFn<logSchema, logSchema> {
        private TupleTagList tags;
        public branching(TupleTagList tags) {
            this.tags = tags;
        }
        @ProcessElement
        public void processElement(@Element logSchema log, MultiOutputReceiver out ) {
            if (log.getType().equals("[notice]")) out.get(tags.get(0)).output(log);
            else if (log.getType().equals("[error]")) out.get(tags.get(1)).output(log);
            else if (log.getType().equals("[warn]")) out.get(tags.get(2)).output(log);
            else if (log.getType().equals("[sout]") ) out.get(tags.get(3)).output(log);
        }
    }

    public void run(){
        TupleTag<logSchema> all = new TupleTag<>();
        TupleTag<logSchema> noticesTag = new TupleTag<>();
        TupleTag<logSchema> errorsTag = new TupleTag<>();
        TupleTag<logSchema> warningsTag = new TupleTag<>();
        TupleTag<logSchema> soutTag = new TupleTag<>();
        TupleTagList tags = TupleTagList.of(noticesTag).and(errorsTag).and(warningsTag).and(soutTag);

        PipelineOptions options = PipelineOptionsFactory.create();

        Pipeline p = Pipeline.create();
        PCollection<String> input = p.apply(TextIO.read().from("C:\...."));
        PCollection logObjects = input
                .apply("Conform", ParDo.of(
                        new conformToSchema()));

        PCollectionTuple multipleOutputs = (PCollectionTuple) logObjects.apply("Branch", ParDo.of(new branching(tags)).withOutputTags(all, tags));

        PCollection<logSchema> notices = multipleOutputs.get(noticesTag);
        PCollection<logSchema> errors = multipleOutputs.get(errorsTag);
        PCollection<logSchema> warning = multipleOutputs.get(warningsTag);
        PCollection<logSchema> sout = multipleOutputs.get(soutTag);

        try {
            p.run().waitUntilFinish();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Btw first ever post here, yay.顺便说一句,这是第一次在这里发帖,是的。

Is there a reason your TupleTags have no type parameter?您的 TupleTags 没有类型参数是否有原因?

TupleTag<logSchema> all = new TupleTag<>();

The example shows a type parameter, and blank implementation.该示例显示了一个类型参数和空白实现。

TupleTag<logSchema> all = new TupleTag<logSchema>(){};

Unrelated style nit: Class names should be capitalized in Java, makes your code more readable.无关样式 nit: Class 名称应在 Java 中大写,使您的代码更具可读性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM