簡體 English 中英

如何在 palantir-foundry 中導入和使用 Spark-Koalas

[英]How do you import and use Spark-Koalas in palantir-foundry

原文 2021-04-08 19:27:39 4 2 python/ pandas/ pyspark/ palantir-foundry/ spark-koalas

How can I -- in Palantir-foundry -- import and use the "Koalas: pandas API for Apache Spark" open source python package.

我知道您可以通過 Code Repo 導入不存在的包並完成此操作，我可以為 Koalas package 執行相同的過程還是需要遵循另一條路線？

2 個解決方案

我能夠使用 Code Repo 上傳 package 的本地克隆，然后使用此處詳述的步驟在平台中添加 package：如何創建 Z23EEEB4347BDD26BFC6B7EE9A3B75antirDDZ 庫以及如何將其導入到伙伴庫中

然而，不久之后，Palantir 管理員推出了一個更新，其中包括 Koalas package 作為平台的原生 package。 但是，到目前為止，我還沒有時間嘗試將其用於任何主要任務。

Koalas 在 Apache Spark 3.2 中作為 **pandas API on Spark** 正式包含在 PySpark 中。 在 Spark 3.2+ 中，您不再需要導入考拉，因為它附帶了 pyspark。 唯一需要的操作是添加 pandas 和 pyarrow，因為這些是代碼存儲庫默認不包含的必需依賴項。 您可以通過庫選項卡執行此操作。

您可以使用此測試轉換確認它是否有效

@transform_df(
    Output("OUTPUT_DATASET_PATH"),
)
def compute():
    import pyspark.pandas as ps
    psdf = ps.DataFrame(
    {'a': [1, 2, 3, 4, 5, 6],
     'b': [100, 200, 300, 400, 500, 600],
     'c': ["one", "two", "three", "four", "five", "six"]},
    index=[10, 20, 30, 40, 50, 60])
    return psdf.to_spark()

要確認您在代碼存儲庫中使用的是 Spark 3.2+，請合並任何待處理的升級 PR。 在 Spark 3.2 之前，可以通過 Libraries 選項卡導入考拉

如何在 Pyspark 和 Palantir Foundry 中使用多個語句將列的值設置為 0

[英]How do I set value to 0 of column with multiple statements in Pyspark and Palantir Foundry

如何在代碼工作簿中合並 Palantir Foundry 中的兩個數據集？

[英]How do I union two datasets in Palantir Foundry within a code workbook?

我想從數據集創建列表並在 palantir 鑄造廠的另一個 function 中使用，但找不到任何解決方案

[英]I want to create the list from the datasest and use in another function in palantir foundry but not able to find any solution

您如何從鑄造廠的石板中調用工作簿邏輯？

[英]How do you call workbook logic from slate within foundry?

使用考拉代替 pandas

[英]Use of koalas instead of pandas

如何使用 conda 安裝考拉？

[英]How to install koalas with conda?

我需要在我的 Spark 集群的每個節點上安裝 Koalas 還是只在主節點上安裝 Koalas？

[英]Do I need to install Koalas on every node of my Spark cluster or just on the master node?

您如何導入jsonschema？

[英]How do you import jsonschema?

在 Palantir Foundry 代碼存儲庫中定義 Pandas UDF 的正確方法是什么

[英]What is the proper way to define a Pandas UDF in a Palantir Foundry Code Repository

如果要在python中擴展類，如何導入另一個類並使用它？

[英]If you are extending a class in python how do you import another class and use it?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何在 Pyspark 和 Palantir Foundry 中使用多個語句將列的值設置為 0 如何在代碼工作簿中合並 Palantir Foundry 中的兩個數據集？我想從數據集創建列表並在 palantir 鑄造廠的另一個 function 中使用，但找不到任何解決方案您如何從鑄造廠的石板中調用工作簿邏輯？使用考拉代替 pandas 如何使用 conda 安裝考拉？我需要在我的 Spark 集群的每個節點上安裝 Koalas 還是只在主節點上安裝 Koalas？您如何導入jsonschema？在 Palantir Foundry 代碼存儲庫中定義 Pandas UDF 的正確方法是什么如果要在python中擴展類，如何導入另一個類並使用它？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM