[英]TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark
Did my research, but didn't find anything on this. 我做了研究,但没有找到任何东西。 I want to convert a simple
pandas.DataFrame
to a spark dataframe, like this: 我想将简单的
pandas.DataFrame
转换为spark数据pandas.DataFrame
,如下所示:
df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist())
The error I get is: 我得到的错误是:
TypeError: Can not infer schema for type: <class 'str'>
I tried something even simpler: 我尝试了一些更简单的方法:
df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)
And I get: 我得到:
TypeError: Can not infer schema for type: <class 'numpy.int64'>
Any help? 有什么帮助吗? Do manually need to specify a schema or so?
是否需要手动指定架构?
sc_sql
is a pyspark.sql.SQLContext
, I am in a jupyter notebook on python 3.4 and spark 1.6. sc_sql
是pyspark.sql.SQLContext
,我在python 3.4和spark 1.6上的jupyter笔记本中。
Thanks! 谢谢!
It's related to your spark version, latest update of spark makes type inference more intelligent. 它与您的spark版本有关,最新的spark更新使类型推断更加智能。 You could have fixed this by adding the schema like this :
您可以通过添加以下模式来解决此问题:
mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.