简体   繁体   English

Apache Beam:扁平化 PCollection <List<Foo> &gt; 到 PCollection<Foo>

[英]Apache Beam: Flattening PCollection<List<Foo>> to PCollection<Foo>

Say we have some nested list:假设我们有一些嵌套列表:

["a", "b"]
["c", "d"]

And we can easily do the flap map in Stream API like this:我们可以像这样在 Stream API 中轻松地做襟翼图:

Stream
        .of(List.of("a", "b"), List.of("c", "d"))
        .flatMap(List::stream)
        .forEach(System.out::println);

But doing it with "FlatMapElements", it was quite a mess:但是用“FlatMapElements”做这件事,真是一团糟:

Pipeline pipeline = Pipeline.create();
pipeline.apply(Create.of(List.of(List.of("a", "b"), List.of("c", "d"))))
        .apply(FlatMapElements.into(TypeDescriptor.of(String.class)).via(list -> list))
        .apply(ParDo.of(new SomeOutputFunction()));

Can we do anything else better with the flat map function?我们可以用平面地图功能做其他更好的事情吗?
A simple flatmap job should not be that complicated so I think I am missing something.一个简单的平面地图工作不应该那么复杂,所以我想我错过了一些东西。
I cannot even replace .via(list -> list) to .via(Function.identity()) due to the type inference problem.由于类型推断问题,我什至无法将.via(list -> list)替换为.via(Function.identity())

“.apply(Flatten.iterables())” 它将“PCollection<List>”转换为“PCollection”

Please refer to the Apache Beam programming guide https://beam.apache.org/documentation/programming-guide/#flatten :请参考 Apache Beam 编程指南https://beam.apache.org/documentation/programming-guide/#flatten

PCollectionList<String> collections = PCollectionList.of(pc1).and(pc2).and(pc3);

PCollection<String> merged = collections.apply(Flatten.<String>pCollections());

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为 PCollection 设置编码器<List<String> &gt; 在 Apache Beam 中? - How do I set the coder for a PCollection<List<String>> in Apache Beam? 如何在 PCollection 中组合数据 - Apache Beam - How to combine Data in PCollection - Apache beam Apache Beam - 使用无界PCollection进行集成测试 - Apache Beam - Integration test with unbounded PCollection 如何使用 Apache Beam 中的流输入 PCollection 请求 Redis 服务器? - How to request Redis server using a streaming input PCollection in Apache Beam? 将Apache Beam的PCollection对象收集到驱动程序的内存中 - Collecting the Apache Beam's PCollection objects into driver's memory Apache Beam 创建具有抽象字段的自定义实体/模型的 PCollection - Apache Beam creating PCollection of Custom Entities/Models with Abstract Fields 如何将 JSON Array 反序列化为 Apache beam PCollection<javaobject></javaobject> - How to deserialize JSON Array to Apache beam PCollection<javaObject> Apache Beam Wait.on JdbcIO.write 与无限 PCollection 问题 - Apache Beam Wait.on JdbcIO.write with unbounded PCollection issue 如何根据 PCollection 的大小编写 Beam 条件 - How to write a Beam condition based on the size of a PCollection 如何在处理PCollection中的元素时将元素发布到kafka主题 <KV<String,String> &gt;在apache梁中的ParDo功能? - How to publish elements to a kafka topic while processing the elements in the PCollection<KV<String,String>> in ParDo function in apache beam?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM