简体   繁体   English

Create Spark Dataframe from Pandas Dataframe with Nested Python Dictionaries and Numpy Arrays

[英]Create Spark Dataframe from Pandas Dataframe with Nested Python Dictionaries and Numpy Arrays

I have a pandas dataframe containing both numpy arrays and dictionaries:我有一个 pandas dataframe 包含 numpy ZA3CBC3F9D0CE2F2C15954E1B671D71 和字典

results_df.head(1)

best_params                                    cv_results                                
{'max_depth': 3, 'min_impurity_decrease': 0.2} {'mean_fit_time': [0.6320801575978597, 1.08473]} 

I would like to be able to create a Spark Dataframe containing similar nested structures (they can be Spark objects if needed) and I tried:我希望能够创建一个包含类似嵌套结构的 Spark Dataframe(如果需要,它们可以是 Spark 对象),我尝试了:

spark.createDataFrame(results_df)
TypeError: not supported type: <class 'numpy.ndarray'>

One solution is to use a databricks supported module called koalas.一种解决方案是使用名为 koalas 的数据块支持模块。 The performance is also pretty good.性能也相当不错。 For more info on koalas: https://koalas.readthedocs.io/en/latest/有关考拉的更多信息: https://koalas.readthedocs.io/en/latest/

import koalas as ks
spark_df = ks.from_pandas(pandas_df)

It's as simple as this in koalas!考拉就这么简单!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM