簡體 English 中英

如何將表直接導入數據塊中的 Python dataframe？

[英]how do I import a table DIRECTLY into a Python dataframe within databricks?

原文 2020-12-04 15:23:15 6 2 python/ databricks

目前在 Databricks 的開發環境中使用筆記本應用一些 Python 代碼來分析數據庫表中保存的一些虛擬數據（只有幾 1,000 行），然后我將其部署到主環境並在真實數據上運行, （數百萬行）

首先，我只需要滿足特定條件的單個列中的值，以便獲取我目前正在執行的數據：

spk_data = spark.sql("SELECT field FROM database.table WHERE field == 'value'")
數據 = spk_data.toPandas()

然后 Python 筆記本的 rest 對在開發環境中運行良好的數據進行處理，但是當我真正運行它時，它在第 2 行掉了下來，說它超出了 ZCD69B4957F06CD818D7ZBF3D691

我想將數據直接導入到 Pandas dataframe 中，因此無需從 Spark 轉換，因為我假設這將避免錯誤，但經過大量谷歌搜索后，我仍然無法弄清楚如何，我唯一的事情是嘗試過在語法上有效的是：

數據 = pd.read_table (r'database.table')

但只要得到：

'PermissionError：[Errno 13] 權限被拒絕：'

（nb。不幸的是，我無法控制我正在查詢的數據庫的內容、形式或位置）

2 個解決方案

你的假設很可能是不正確的。

Spark 是一個分布式計算引擎，pandas 是一個單節點工具集。 因此，當您對數百萬行運行查詢時，它可能會失敗。 在執行 df.toPandas 時，Spark 會將所有數據移動到您的驅動程序節點，因此如果它超過驅動程序 memory，它將失敗，並出現 memory 異常。 換句話說，如果您的數據集更大，那么 memory、pandas 將無法正常工作。

此外，在數據塊上使用 pandas 時，您將失去使用底層集群的所有好處。 您只是在使用驅動程序。

有兩個明智的選擇可以解決這個問題：

使用 spark 重做您的解決方案
使用具有 API 的考拉，大部分與 pandas 兼容

對於這種情況，您必須使用pd.read_sql_query 。

DataBricks-如何將DataFrame保存到Python中的表中

[英]DataBricks- How to save DataFrame to table in Python

如何直接從Python包中導入類？

[英]How do I import a class directly from a Python package?

如何從數據塊中的庫中導入類？

[英]How do I Import a class from library in databricks?

如何在數據塊上的 Pandas 數據框中使用 SQL 相交運算符

[英]How do I use SQL intersect operator in Pandas dataframe on databricks

如何在 Databricks 上將壓縮的 TSV 文件讀取到 dataframe？

[英]How do I read a zipped TSV file to dataframe on Databricks?

如何在python腳本中導入繪圖？

[英]How do I import a plotly graph within a python script?

如何將python表數據導入sql表？

[英]How do I import python table data into an sql table?

如何將數據框中的符號轉換為 python 中的浮點數？

[英]How do I convert a symbol within a dataframe to a float in python?

I want to count the elements of a python list that is within a dataframe, and for the output to be a column in the dataframe. 我怎么做？

[英]I want to count the elements of a python list that is within a dataframe, and for the output to be a column in the dataframe. How do I do that?

如何將數據框結果保存到數據塊中的表中？

[英]How to save a dataframe result into a table in databricks?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 DataBricks-如何將DataFrame保存到Python中的表中如何直接從Python包中導入類？如何從數據塊中的庫中導入類？如何在數據塊上的 Pandas 數據框中使用 SQL 相交運算符如何在 Databricks 上將壓縮的 TSV 文件讀取到 dataframe？如何在python腳本中導入繪圖？如何將python表數據導入sql表？如何將數據框中的符號轉換為 python 中的浮點數？ I want to count the elements of a python list that is within a dataframe, and for the output to be a column in the dataframe. 我怎么做？如何將數據框結果保存到數據塊中的表中？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM