簡體 English 中英

如何在apache beam中實現groupby(column1,column2)

[英]How to implement groupby(column1,column2) in apache beam

原文 2020-09-21 15:11:11 5 1 google-cloud-dataflow/ apache-beam

我需要幫助在 python 中為以下 Spark sql 代碼編寫類似的光束代碼。

count_mnm_df = (mnm_df
     .select("State", "Color", "Count") 
     .groupBy("State", "Color") 
     .agg(count("Count").alias("Total")) 
     .orderBy("Total", ascending=False)

1 個解決方案

到上面最直接的映射可能是 Beam SQL。有關更多信息，請參見此處。 請在此處查看相應的 Python 轉換，其中還包含有關使用的信息。 請注意，對 Python SDK 的支持是通過 Beam 相對較新的跨語言轉換支持實現的。

您還可以考慮使用執行相同計算的可用 Beam 轉換來編寫 Beam 管道。

請注意，Beam 不保證PCollection元素的順序。

將字符串列轉換為Apache Beam中的整數列？

[英]Convert a String Column to an integer column in apache beam?

如何在 Apache Beam / Cloud Dataflow 中實現回顧

[英]How to implement a lookback in Apache Beam / Cloud Dataflow

使用 Apache Beam 重命名列名並創建新的列名

[英]Renaming Column names and creating new column names using apache beam

如何從 apache 梁（谷歌雲數據流）中的列字符串中刪除特殊字符，包括逗號、引號

[英]How to remove special characters including commas, quotes from a column string in apache beam (Google cloud dataflow)

如何在使用 python SDK 將 BIG QUERY 中的數據讀取到 apache 光束中的 PCollection 時將源列重命名為目標列名

[英]how to rename the source columns to target column names while reading the data from BIG QUERY into PCollection in apache beam using python SDK

Groupby 現有屬性存在於 json 字符串行中 apache 光束 java

[英]Groupby existing attribute present in json string line in apache beam java

Apache Beam 可以檢測 Spark 和 Pandas 等 Parquet 文件的架構（列名）嗎？

[英]Can Apache Beam detect the schema (column names) of a Parquet file like Spark and Pandas?

如何在Apache Beam中使用Pandas？

[英]How to use Pandas in apache beam?

如何使用 Apache Beam 管理背壓

[英]How to manage backpressure with Apache Beam

如何為 Apache Beam/Dataflow 經典模板（Python）和數據管道實現 CI/CD 管道

[英]How to implement a CI/CD pipeline for Apache Beam/Dataflow classic templates (Python) & data pipelines

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 將字符串列轉換為Apache Beam中的整數列？如何在 Apache Beam / Cloud Dataflow 中實現回顧使用 Apache Beam 重命名列名並創建新的列名如何從 apache 梁（谷歌雲數據流）中的列字符串中刪除特殊字符，包括逗號、引號如何在使用 python SDK 將 BIG QUERY 中的數據讀取到 apache 光束中的 PCollection 時將源列重命名為目標列名 Groupby 現有屬性存在於 json 字符串行中 apache 光束 java Apache Beam 可以檢測 Spark 和 Pandas 等 Parquet 文件的架構（列名）嗎？如何在Apache Beam中使用Pandas？如何使用 Apache Beam 管理背壓如何為 Apache Beam/Dataflow 經典模板（Python）和數據管道實現 CI/CD 管道

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM