简体   繁体   English

如何在 palantir-foundry 中导入和使用 Spark-Koalas

[英]How do you import and use Spark-Koalas in palantir-foundry

How can I -- in Palantir-foundry -- import and use the "Koalas: pandas API for Apache Spark" open source python package. How can I -- in Palantir-foundry -- import and use the "Koalas: pandas API for Apache Spark" open source python package.

I know that you can import packages that don't exist through Code Repo and have done this, can I do this same process for Koalas package or do I need to follow another route?我知道您可以通过 Code Repo 导入不存在的包并完成此操作,我可以为 Koalas package 执行相同的过程还是需要遵循另一条路线?

I was able to use Code Repo to upload a local clone of the package and then add the package in platform using the steps detailed here: How to create python libraries and how to import it in palantir foundry我能够使用 Code Repo 上传 package 的本地克隆,然后使用此处详述的步骤在平台中添加 package: 如何创建 Z23EEEB4347BDD26BFC6B7EE9A3B75antirDDZ 库以及如何将其导入到伙伴库中

However, shortly afterwards Palantir admins introduced an update which included the Koalas package as a native package to the platform.然而,不久之后,Palantir 管理员推出了一个更新,其中包括 Koalas package 作为平台的原生 package。 I have not however had time to try using this for any major tasks as of yet.但是,到目前为止,我还没有时间尝试将其用于任何主要任务。

Koalas is officially included in PySpark as **pandas API on Spark** in Apache Spark 3.2 . Koalas 在 Apache Spark 3.2 中作为 **pandas API on Spark** 正式包含在 PySpark 中 In Spark 3.2+, you no longer need to import koalas, as it comes with pyspark.在 Spark 3.2+ 中,您不再需要导入考拉,因为它附带了 pyspark。 The only required action is to add pandas and pyarrow as these are required dependencies that Code Repositories don't include by default.唯一需要的操作是添加 pandas 和 pyarrow,因为这些是代码存储库默认不包含的必需依赖项。 You can do so via Libraries tab.您可以通过库选项卡执行此操作。

在此处输入图像描述

You can confirm that it works using this test transform您可以使用此测试转换确认它是否有效

@transform_df(
    Output("OUTPUT_DATASET_PATH"),
)
def compute():
    import pyspark.pandas as ps
    psdf = ps.DataFrame(
    {'a': [1, 2, 3, 4, 5, 6],
     'b': [100, 200, 300, 400, 500, 600],
     'c': ["one", "two", "three", "four", "five", "six"]},
    index=[10, 20, 30, 40, 50, 60])
    return psdf.to_spark()

To confirm that you are using Spark 3.2+ in your Code repository, please merge any pending upgrade PRs.要确认您在代码存储库中使用的是 Spark 3.2+,请合并任何待处理的升级 PR。 Prior to Spark 3.2, it was possible to import koalas through Libraries tab在 Spark 3.2 之前,可以通过 Libraries 选项卡导入考拉

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Pyspark 和 Palantir Foundry 中使用多个语句将列的值设置为 0 - How do I set value to 0 of column with multiple statements in Pyspark and Palantir Foundry 如何在代码工作簿中合并 Palantir Foundry 中的两个数据集? - How do I union two datasets in Palantir Foundry within a code workbook? 我想从数据集创建列表并在 palantir 铸造厂的另一个 function 中使用,但找不到任何解决方案 - I want to create the list from the datasest and use in another function in palantir foundry but not able to find any solution 您如何从铸造厂的石板中调用工作簿逻辑? - How do you call workbook logic from slate within foundry? 使用考拉代替 pandas - Use of koalas instead of pandas 如何使用 conda 安装考拉? - How to install koalas with conda? 我需要在我的 Spark 集群的每个节点上安装 Koalas 还是只在主节点上安装 Koalas? - Do I need to install Koalas on every node of my Spark cluster or just on the master node? 您如何导入jsonschema? - How do you import jsonschema? 在 Palantir Foundry 代码存储库中定义 Pandas UDF 的正确方法是什么 - What is the proper way to define a Pandas UDF in a Palantir Foundry Code Repository 如果要在python中扩展类,如何导入另一个类并使用它? - If you are extending a class in python how do you import another class and use it?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM