简体   繁体   中英

Apache Beam: Flattening PCollection<List<Foo>> to PCollection<Foo>

Say we have some nested list:

["a", "b"]
["c", "d"]

And we can easily do the flap map in Stream API like this:

Stream
        .of(List.of("a", "b"), List.of("c", "d"))
        .flatMap(List::stream)
        .forEach(System.out::println);

But doing it with "FlatMapElements", it was quite a mess:

Pipeline pipeline = Pipeline.create();
pipeline.apply(Create.of(List.of(List.of("a", "b"), List.of("c", "d"))))
        .apply(FlatMapElements.into(TypeDescriptor.of(String.class)).via(list -> list))
        .apply(ParDo.of(new SomeOutputFunction()));

Can we do anything else better with the flat map function?
A simple flatmap job should not be that complicated so I think I am missing something.
I cannot even replace .via(list -> list) to .via(Function.identity()) due to the type inference problem.

“.apply(Flatten.iterables())” 它将“PCollection<List>”转换为“PCollection”

Please refer to the Apache Beam programming guide https://beam.apache.org/documentation/programming-guide/#flatten :

PCollectionList<String> collections = PCollectionList.of(pc1).and(pc2).and(pc3);

PCollection<String> merged = collections.apply(Flatten.<String>pCollections());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM