简体   繁体   English

如何在apache beam中实现groupby(column1,column2)

[英]How to implement groupby(column1,column2) in apache beam

I need help in writing similar beam code in python for the following Spark sql code.我需要帮助在 python 中为以下 Spark sql 代码编写类似的光束代码。

count_mnm_df = (mnm_df
     .select("State", "Color", "Count") 
     .groupBy("State", "Color") 
     .agg(count("Count").alias("Total")) 
     .orderBy("Total", ascending=False)

Probably the most straightforward mapping to above will be Beam SQL. See here for more information.到上面最直接的映射可能是 Beam SQL。有关更多信息,请参见此处 Please see here for corresponding Python transform which also contains information regarding usage.在此处查看相应的 Python 转换,其中还包含有关使用的信息。 Please note that support for Python SDK is achieved through Beam's cross-language transforms support which is relatively new.请注意,对 Python SDK 的支持是通过 Beam 相对较新的跨语言转换支持实现的。

You can also consider authoring a Beam pipeline using available Beam transforms that performs that same computation.您还可以考虑使用执行相同计算的可用 Beam 转换来编写 Beam 管道。

Note that Beam does not guarantee the order of elements of a PCollection .请注意,Beam 不保证PCollection元素的顺序。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将字符串列转换为Apache Beam中的整数列? - Convert a String Column to an integer column in apache beam? 如何在 Apache Beam / Cloud Dataflow 中实现回顾 - How to implement a lookback in Apache Beam / Cloud Dataflow 使用 Apache Beam 重命名列名并创建新的列名 - Renaming Column names and creating new column names using apache beam 如何从 apache 梁(谷歌云数据流)中的列字符串中删除特殊字符,包括逗号、引号 - How to remove special characters including commas, quotes from a column string in apache beam (Google cloud dataflow) 如何在使用 python SDK 将 BIG QUERY 中的数据读取到 apache 光束中的 PCollection 时将源列重命名为目标列名 - how to rename the source columns to target column names while reading the data from BIG QUERY into PCollection in apache beam using python SDK Groupby 现有属性存在于 json 字符串行中 apache 光束 java - Groupby existing attribute present in json string line in apache beam java Apache Beam 可以检测 Spark 和 Pandas 等 Parquet 文件的架构(列名)吗? - Can Apache Beam detect the schema (column names) of a Parquet file like Spark and Pandas? 如何在Apache Beam中使用Pandas? - How to use Pandas in apache beam? 如何使用 Apache Beam 管理背压 - How to manage backpressure with Apache Beam 如何为 Apache Beam/Dataflow 经典模板(Python)和数据管道实现 CI/CD 管道 - How to implement a CI/CD pipeline for Apache Beam/Dataflow classic templates (Python) & data pipelines
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM