[英]Apache Beam: Flattening PCollection<List<Foo>> to PCollection<Foo>
Say we have some nested list:假设我们有一些嵌套列表:
["a", "b"]
["c", "d"]
And we can easily do the flap map in Stream API like this:我们可以像这样在 Stream API 中轻松地做襟翼图:
Stream
.of(List.of("a", "b"), List.of("c", "d"))
.flatMap(List::stream)
.forEach(System.out::println);
But doing it with "FlatMapElements", it was quite a mess:但是用“FlatMapElements”做这件事,真是一团糟:
Pipeline pipeline = Pipeline.create();
pipeline.apply(Create.of(List.of(List.of("a", "b"), List.of("c", "d"))))
.apply(FlatMapElements.into(TypeDescriptor.of(String.class)).via(list -> list))
.apply(ParDo.of(new SomeOutputFunction()));
Can we do anything else better with the flat map function?我们可以用平面地图功能做其他更好的事情吗?
A simple flatmap job should not be that complicated so I think I am missing something.一个简单的平面地图工作不应该那么复杂,所以我想我错过了一些东西。
I cannot even replace .via(list -> list)
to .via(Function.identity())
due to the type inference problem.由于类型推断问题,我什至无法将.via(list -> list)
替换为.via(Function.identity())
。
“.apply(Flatten.iterables())” 它将“PCollection<List>”转换为“PCollection”
Please refer to the Apache Beam programming guide https://beam.apache.org/documentation/programming-guide/#flatten :请参考 Apache Beam 编程指南https://beam.apache.org/documentation/programming-guide/#flatten :
PCollectionList<String> collections = PCollectionList.of(pc1).and(pc2).and(pc3);
PCollection<String> merged = collections.apply(Flatten.<String>pCollections());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.