简体   繁体   中英

Trouble emitting elements to multiple output PCollections in Apache Beam

I got troubles with using MultiOutputReciver to PCollectionTuple with TupleTag in Apache Beam.

logSchema is my AvroGenerated class for handling incoming logs. date, type, message etc. What I want to do is store the different types of logs (errors, warnings, notices) in differenct PCollections.

I get this error java: incompatible types: logSchema cannot be converted to capture#1 of?

for every out.get(tags.get(0)).output(log); inside processElement inside branching extends DoFn<logSchema, logSchema>

Basically Required type: capture of? Provided: logSchema Required type: capture of? Provided: logSchema

I've mostly followed the Beam Programming Guide covering additional outputs but also what other examples I could find here.

Anyone care to explain what Im getting wrong? Im feeling lost but also close.

Edit: Seems like I forgot the.withOutputTags() on the branching ParDo of branching. Added this to the code below, still getting the same incompatible types error. IDEA red lines it (did before aswell) and wants to cast to (PCollectionTuple), why is this necessary?

Here is my code

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionTuple;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;
import java.text.ParseException;

public class Pipe extends Thread {

    public Pipe() {
    }

    static class conformToSchema extends DoFn<String, logSchema> {
        @ProcessElement
        public void processElement(@Element String element, OutputReceiver<logSchema> receiver ) throws ParseException {
            logSchema log = new logSchema(element);
            receiver.output(log);
        }
    }

    static class branching extends DoFn<logSchema, logSchema> {
        private TupleTagList tags;
        public branching(TupleTagList tags) {
            this.tags = tags;
        }
        @ProcessElement
        public void processElement(@Element logSchema log, MultiOutputReceiver out ) {
            if (log.getType().equals("[notice]")) out.get(tags.get(0)).output(log);
            else if (log.getType().equals("[error]")) out.get(tags.get(1)).output(log);
            else if (log.getType().equals("[warn]")) out.get(tags.get(2)).output(log);
            else if (log.getType().equals("[sout]") ) out.get(tags.get(3)).output(log);
        }
    }

    public void run(){
        TupleTag<logSchema> all = new TupleTag<>();
        TupleTag<logSchema> noticesTag = new TupleTag<>();
        TupleTag<logSchema> errorsTag = new TupleTag<>();
        TupleTag<logSchema> warningsTag = new TupleTag<>();
        TupleTag<logSchema> soutTag = new TupleTag<>();
        TupleTagList tags = TupleTagList.of(noticesTag).and(errorsTag).and(warningsTag).and(soutTag);

        PipelineOptions options = PipelineOptionsFactory.create();

        Pipeline p = Pipeline.create();
        PCollection<String> input = p.apply(TextIO.read().from("C:\...."));
        PCollection logObjects = input
                .apply("Conform", ParDo.of(
                        new conformToSchema()));

        PCollectionTuple multipleOutputs = (PCollectionTuple) logObjects.apply("Branch", ParDo.of(new branching(tags)).withOutputTags(all, tags));

        PCollection<logSchema> notices = multipleOutputs.get(noticesTag);
        PCollection<logSchema> errors = multipleOutputs.get(errorsTag);
        PCollection<logSchema> warning = multipleOutputs.get(warningsTag);
        PCollection<logSchema> sout = multipleOutputs.get(soutTag);

        try {
            p.run().waitUntilFinish();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Btw first ever post here, yay.

Is there a reason your TupleTags have no type parameter?

TupleTag<logSchema> all = new TupleTag<>();

The example shows a type parameter, and blank implementation.

TupleTag<logSchema> all = new TupleTag<logSchema>(){};

Unrelated style nit: Class names should be capitalized in Java, makes your code more readable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM