[英]PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'>
when using PySpark with the following code: 使用带有以下代码的PySpark时:
from pyspark.sql.types import *
samples = np.array([0.1,0.2])
dfSchema = StructType([StructField("x", FloatType(), True)])
spark.createDataFrame(samples,dfSchema)
I get: 我得到:
TypeError: StructType can not accept object 0.10000000000000001 in type type 'numpy.float64'> TypeError:StructType无法接受类型为'numpy.float64'的对象0.10000000000000001>
Any idea? 任何想法?
NumPy types, including numpy.float64
, are not a valid external representation for Spark SQL types. NumPy类型(包括numpy.float64
)不是Spark SQL类型的有效外部表示形式。 Furthermore schema you use doesn't reflect the shape of the data. 此外,您使用的架构不会反映数据的形状。
You should use standard Python types, and corresponding DataType
directly: 您应该直接使用标准Python类型和相应的DataType
:
spark.createDataFrame(samples.tolist(), FloatType()).toDF("x")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.