如何在apache beam中实现groupby(column1,column2)

Question

I need help in writing similar beam code in python for the following Spark sql code.我需要帮助在 python 中为以下 Spark sql 代码编写类似的光束代码。

count_mnm_df = (mnm_df
     .select("State", "Color", "Count") 
     .groupBy("State", "Color") 
     .agg(count("Count").alias("Total")) 
     .orderBy("Total", ascending=False)

Answer 1

Probably the most straightforward mapping to above will be Beam SQL. See here for more information.到上面最直接的映射可能是 Beam SQL。有关更多信息，请参见此处。 Please see here for corresponding Python transform which also contains information regarding usage.请在此处查看相应的 Python 转换，其中还包含有关使用的信息。 Please note that support for Python SDK is achieved through Beam's cross-language transforms support which is relatively new.请注意，对 Python SDK 的支持是通过 Beam 相对较新的跨语言转换支持实现的。

You can also consider authoring a Beam pipeline using available Beam transforms that performs that same computation.您还可以考虑使用执行相同计算的可用 Beam 转换来编写 Beam 管道。

Note that Beam does not guarantee the order of elements of a PCollection .请注意，Beam 不保证PCollection元素的顺序。

如何在apache beam中实现groupby(column1,column2)

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-22 01:29:41

如何在apache beam中实现groupby(column1,column2)

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-22 01:29:41

解决方案1
1 已采纳 2020-09-22 01:29:41