如何将Scala spark.sql.dataFrame转换为Pandas数据框架

Question

I wanted to Convert scala dataframe into pandas data frame 我想将scala数据框转换为熊猫数据框

    val collection = spark.read.sqlDB(config)
    collection.show()

    #Should be like df=collection

Answer 1

You are asking for a way of using a Python library from Scala. 您正在寻找一种使用Scala中的Python库的方法。 This is a bit weird to me. 这对我来说有点奇怪。 Are you sure you have to do that? 您确定必须这样做吗？ Maybe you know that, but Scala DataFrames have a good API that will probably give you the functionality you need from pandas. 也许您知道这一点，但是Scala DataFrames具有良好的API，可能会为您提供熊猫所需的功能。

If you still need to use pandas, I would suggest you to write the data that you need to a file (a csv, for example). 如果您仍然需要使用熊猫，建议您将所需的数据写入文件（例如，csv）。 Then, using a Python application you can load that file into a pandas dataframe and work from there. 然后，使用Python应用程序可以将该文件加载到pandas数据框中并从那里工作。

Trying to create a pandas object from Scala is probably overcomplicating things (and I am not sure it is currently possible). 尝试从Scala创建pandas对象可能会使事情复杂化（而且我不确定当前是否可行）。

Answer 2

I think If you want to use pandas based API in SPARK code, then you can install Koalas-Python library. 我认为，如果您想在SPARK代码中使用基于pandas的API，则可以安装Koalas-Python库。 So, Whatever the function you want to use from pandas API directly you can embed them in SPARK code. 因此，无论您想直接从pandas API中使用什么功能，都可以将它们嵌入SPARK代码中。

To install kolas 安装可乐

pip install koalas

如何将Scala spark.sql.dataFrame转换为Pandas数据框架

问题描述

2 个解决方案

解决方案1
1 2019-08-05 15:10:55

解决方案2
0 2019-08-05 06:47:21

如何将Scala spark.sql.dataFrame转换为Pandas数据框架

问题描述

2 个解决方案

解决方案1 1 2019-08-05 15:10:55

解决方案2 0 2019-08-05 06:47:21

解决方案1
1 2019-08-05 15:10:55

解决方案2
0 2019-08-05 06:47:21