TypeError在Pyspark中将Pandas数据框转换为Spark数据框

Question

Did my research, but didn't find anything on this. 我做了研究，但没有找到任何东西。 I want to convert a simple pandas.DataFrame to a spark dataframe, like this: 我想将简单的pandas.DataFrame转换为spark数据pandas.DataFrame ，如下所示：

df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': [1, 2, 3]})
sc_sql.createDataFrame(df, schema=df.columns.tolist())

The error I get is: 我得到的错误是：

TypeError: Can not infer schema for type: <class 'str'>

I tried something even simpler: 我尝试了一些更简单的方法：

df = pd.DataFrame([1, 2, 3])
sc_sql.createDataFrame(df)

And I get: 我得到：

TypeError: Can not infer schema for type: <class 'numpy.int64'>

Any help? 有什么帮助吗？ Do manually need to specify a schema or so? 是否需要手动指定架构？

sc_sql is a pyspark.sql.SQLContext , I am in a jupyter notebook on python 3.4 and spark 1.6. sc_sql是pyspark.sql.SQLContext ，我在python 3.4和spark 1.6上的jupyter笔记本中。

Thanks! 谢谢！

Answer 1

It's related to your spark version, latest update of spark makes type inference more intelligent. 它与您的spark版本有关，最新的spark更新使类型推断更加智能。 You could have fixed this by adding the schema like this : 您可以通过添加以下模式来解决此问题：

mySchema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True)])
sc_sql.createDataFrame(df,schema=mySchema)

TypeError在Pyspark中将Pandas数据框转换为Spark数据框

问题描述

1 个解决方案

解决方案1
2 2017-04-17 20:24:53

TypeError在Pyspark中将Pandas数据框转换为Spark数据框

问题描述

1 个解决方案

解决方案1 2 2017-04-17 20:24:53

解决方案1
2 2017-04-17 20:24:53