简体   繁体   English

PySpark:TypeError:StructType不能接受类型的对象 <type 'unicode'> 要么 <type 'str'>

[英]PySpark: TypeError: StructType can not accept object in type <type 'unicode'> or <type 'str'>

I am reading data from a CSV file and then creating a DataFrame. 我正在从CSV文件中读取数据,然后创建一个DataFrame。 But when I try to access the data in the DataFrame I get TypeError. 但是,当我尝试访问DataFrame中的数据时,出现TypeError。

fields = [StructField(field_name, StringType(), True) for field_name in schema.split(',')]
schema = StructType(fields)

input_dataframe = sql_context.createDataFrame(input_data_1, schema)

print input_dataframe.filter(input_dataframe.diagnosis_code == '11').count()

Both 'unicode' and 'str' are not working with Spark DataFrame. 'unicode'和'str'均不适用于Spark DataFrame。 I get the below TypeError: 我得到以下TypeError:

TypeError: StructType can not accept object in type TypeError: StructType can not accept object in type TypeError:StructType不能接受类型的对象TypeError:StructType不能接受类型的对象

I tried encoding in 'utf-8' as below but still get the error but now complaining about TypeError with 'str': 我尝试按如下所示在“ utf-8”中进行编码,但仍然收到错误,但现在抱怨带有“ str”的TypeError:

input_data_2 = input_data_1.map(lambda x: x.encode("utf-8"))
input_dataframe = sql_context.createDataFrame(input_data_2, schema)

print input_dataframe.filter(input_dataframe.diagnosis_code == '410.11').count()

I also tried parsing the CSV directly as utf-8 or unicode using the param use_unicode=True/False 我还尝试使用参数use_unicode = True / False将CSV直接解析为utf-8或Unicode

Reading between the lines. 字里行间的阅读。 You are 你是

reading data from a CSV file 从CSV文件读取数据

and get 并得到

TypeError: StructType can not accept object in type <type 'unicode'>

This happens because you pass a string not an object compatible with struct. 发生这种情况是因为您传递的字符串不是与struct兼容的对象。 Probably you pass data like: 可能您传递如下数据:

input_data_1 = sc.parallelize(["1,foo,2", "2,bar,3"])

and schema 和模式

schema = "x,y,z"

fields = [StructField(field_name, StringType(), True) for field_name in schema.split(',')]
schema = StructType(fields)

and you expect Spark to figure things out. 并且您希望Spark能够解决问题。 But it doesn't work that way. 但这不是那样的。 You could 你可以

input_dataframe = sqlContext.createDataFrame(input_data_1.map(lambda s: s.split(",")), schema)

but honestly just use Spark csv reader: 但老实说,只需使用Spark csv阅读器即可:

spark.read.schema(schema).csv("/path/to/file")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pyspark createDataframe typeerror: structtype can not accept object 'id' in type<class 'str'></class> - pyspark createDataframe typeerror: structtype can not accept object 'id' in type <class 'str'> pyspark:TypeError:IntegerType不能接受类型中的对象<type 'unicode'> - pyspark: TypeError: IntegerType can not accept object in type <type 'unicode'> PySpark错误:StructType不能接受类型为0的对象<type 'int'> - PySpark Error: StructType can not accept object 0 in type <type 'int'> PySpark:TypeError:StructType不能接受类型为0.10000000000000001的对象<type 'numpy.float64'> - PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'> TypeError: StructType 不能接受 object '1/1/2021 1:00:00 AM' 类型 - TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type Spark Sql: TypeError(“StructType 不能接受 %s 类型的对象”% type(obj)) - Spark Sql: TypeError(“StructType can not accept object in type %s” % type(obj)) 获取TypeError(“StructType不能接受类型%s中的对象%r”%(obj,type(obj))) - Getting TypeError(“StructType can not accept object %r in type %s” % (obj, type(obj))) Pyspark DataframeType error a: DoubleType can not accept object 'a' in type<class 'str'></class> - Pyspark DataframeType error a: DoubleType can not accept object 'a' in type <class 'str'> Spark TypeError:LongType不能接受类型中的对象u&#39;Value&#39; <type 'unicode'> - Spark TypeError: LongType can not accept object u'Value' in type <type 'unicode'> 输入 unicode 并输入 str - type unicode and type str
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM