簡體 English 中英

Spark Arrow，toPandas（）和廣泛的轉型

[英]Spark Arrow, toPandas() and wide transformation

原文 2019-08-29 16:15:25 4 1 python/ pandas/ apache-spark/ apache-arrow

使用箭頭優化時，toPandas（）實際做什么？

生成的pandas數據框是否可以安全地在pandas數據框上進行廣泛的轉換（需要數據改組）。 .merge操作？ 分組和聚合呢？ 我應該期望什么樣的性能限制？

我試圖盡可能地將Pandas數據框標准化，因為它易於進行單元測試，並且可以與內存中對象互換，而無需啟動可怕的spark實例。

1 個解決方案

toPandas()接收spark數據 toPandas()對象，並將客戶端驅動程序計算機上的所有分區拉為pandas數據框 。 對這個新對象（ pandas dataframe ）的任何操作都將在使用python的單台計算機上運行，因此無法進行廣泛的轉換，因為您不再使用Spark Cluster分布式計算（即，沒有分區/工人節點的交互）。

Spark 2.0 toPandas方法

[英]Spark 2.0 toPandas method

Spark Dataframe中的塊topandas

[英]chunk topandas from spark dataframe

Spark DataFrame方法“ toPandas”實際上在做什么？

[英]What is the Spark DataFrame method `toPandas` actually doing?

可能將數據框拆分為topandas的火花

[英]spark possible to split dataframe into parts for topandas

從spark數據幀中取n行並傳遞給toPandas（）

[英]Take n rows from a spark dataframe and pass to toPandas()

在Spark中使用Lambda進行列表轉換

[英]List Transformation With Lambdas in Spark

Spark數據幀轉換異常

[英]Abnormal Spark dataframe transformation

火花：Dataframe 改造

[英]Spark: Dataframe Transformation

大熊貓數據轉換

[英]pandas data transformation long-wide-long

Spark RDD和Dataframe變換優化

[英]Spark RDD and Dataframe transformation optimisation

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Spark 2.0 toPandas方法 Spark Dataframe中的塊topandas Spark DataFrame方法“ toPandas”實際上在做什么？可能將數據框拆分為topandas的火花從spark數據幀中取n行並傳遞給toPandas（）在Spark中使用Lambda進行列表轉換 Spark數據幀轉換異常火花：Dataframe 改造大熊貓數據轉換 Spark RDD和Dataframe變換優化

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM