[英]How to convert scala spark.sql.dataFrame to Pandas data frame
I wanted to Convert scala dataframe into pandas data frame 我想将scala数据框转换为熊猫数据框
val collection = spark.read.sqlDB(config)
collection.show()
#Should be like df=collection
You are asking for a way of using a Python library from Scala. 您正在寻找一种使用Scala中的Python库的方法。 This is a bit weird to me.
这对我来说有点奇怪。 Are you sure you have to do that?
您确定必须这样做吗? Maybe you know that, but Scala DataFrames have a good API that will probably give you the functionality you need from pandas.
也许您知道这一点,但是Scala DataFrames具有良好的API,可能会为您提供熊猫所需的功能。
If you still need to use pandas, I would suggest you to write the data that you need to a file (a csv, for example). 如果您仍然需要使用熊猫,建议您将所需的数据写入文件(例如,csv)。 Then, using a Python application you can load that file into a pandas dataframe and work from there.
然后,使用Python应用程序可以将该文件加载到pandas数据框中并从那里工作。
Trying to create a pandas object from Scala is probably overcomplicating things (and I am not sure it is currently possible). 尝试从Scala创建pandas对象可能会使事情复杂化(而且我不确定当前是否可行)。
I think If you want to use pandas
based API in SPARK
code, then you can install Koalas-Python
library. 我认为,如果您想在
SPARK
代码中使用基于pandas
的API,则可以安装Koalas-Python
库。 So, Whatever the function you want to use from pandas
API directly you can embed them in SPARK
code. 因此,无论您想直接从
pandas
API中使用什么功能,都可以将它们嵌入SPARK
代码中。
To install kolas 安装可乐
pip install koalas
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.