简体   繁体   English

Beam / Dataflow 2.2.0-从pcollection中提取前n个元素

[英]Beam/Dataflow 2.2.0 - extract first n elements from pcollection

Is there any way to extract first n elements in a beam pcollection? 有什么方法可以提取光束集合中的前n个元素? The documentation doesn't seem to indicate any such function. 该文档似乎并未指示任何此类功能。 I think such an operation would require first a global element number assignment and then a filter - would be nice to have this functionality. 我认为这样的操作首先需要分配全局元素编号,然后需要过滤器-拥有此功能会很好。

I use Google DataFlow Java SDK 2.2.0 . 我使用Google DataFlow Java SDK 2.2.0

PCollection's are unordered per se, so the notion of "first N elements" does not exist - however: PCollection本身是无序的,因此不存在“前N个元素”的概念-但是:

  • In case you need the top N elements by some criterion, you can use the Top transform . 如果您需要通过某些条件来确定前N个元素,则可以使用Top变换

  • In case you need any N elements, you can use Sample . 如果需要任何 N个元素,则可以使用Sample

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 PCollection 中提取信息<row>加入 apache 光束后?</row> - How to extract information from PCollection<Row> after a join in apache beam? 如何转换 PCollection<row> 使用 Java 到数据流 Apache 中的 Integer</row> - How to convert PCollection<Row> to Integer in Dataflow Apache beam using Java 如何转换 PCollection<row> 在数据流 Apache 中使用 Java 束</row> - How to convert PCollection<Row> to Long in Dataflow Apache beam using Java 如何从 PCollection 获取所有文件元数据<string>在光束中</string> - How to get all file metadata from PCollection<string> in beam 如何创建 PCollection<Row> 来自 PCollection<String> 用于执行梁 SQL 转换 - How to create PCollection<Row> from PCollection<String> for performing beam SQL Trasforms Google Dataflow:从 Google Cloud Storage 读取未绑定的 PCollection - Google Dataflow: Read unbound PCollection from Google Cloud Storage 如何在处理PCollection中的元素时将元素发布到kafka主题 <KV<String,String> &gt;在apache梁中的ParDo功能? - How to publish elements to a kafka topic while processing the elements in the PCollection<KV<String,String>> in ParDo function in apache beam? 从 Apache Beam (GCP Dataflow) 写入 ConfluentCloud - Write to ConfluentCloud from Apache Beam (GCP Dataflow) Kubernetes的运行波束数据流作业 - Running beam dataflow Jobs from Kubernetes Apache Beam:扁平化 PCollection <List<Foo> &gt; 到 PCollection<Foo> - Apache Beam: Flattening PCollection<List<Foo>> to PCollection<Foo>
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM