[英]How to implement groupby(column1,column2) in apache beam
I need help in writing similar beam code in python for the following Spark sql code.我需要帮助在 python 中为以下 Spark sql 代码编写类似的光束代码。
count_mnm_df = (mnm_df
.select("State", "Color", "Count")
.groupBy("State", "Color")
.agg(count("Count").alias("Total"))
.orderBy("Total", ascending=False)
Probably the most straightforward mapping to above will be Beam SQL. See here for more information.到上面最直接的映射可能是 Beam SQL。有关更多信息,请参见此处。 Please see here for corresponding Python transform which also contains information regarding usage.
请在此处查看相应的 Python 转换,其中还包含有关使用的信息。 Please note that support for Python SDK is achieved through Beam's cross-language transforms support which is relatively new.
请注意,对 Python SDK 的支持是通过 Beam 相对较新的跨语言转换支持实现的。
You can also consider authoring a Beam pipeline using available Beam transforms that performs that same computation.您还可以考虑使用执行相同计算的可用 Beam 转换来编写 Beam 管道。
Note that Beam does not guarantee the order of elements of a PCollection
.请注意,Beam 不保证
PCollection
元素的顺序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.