[英]PySpark: TypeError: StructType can not accept object in type <type 'unicode'> or <type 'str'>
I am reading data from a CSV file and then creating a DataFrame. 我正在从CSV文件中读取数据,然后创建一个DataFrame。 But when I try to access the data in the DataFrame I get TypeError. 但是,当我尝试访问DataFrame中的数据时,出现TypeError。
fields = [StructField(field_name, StringType(), True) for field_name in schema.split(',')]
schema = StructType(fields)
input_dataframe = sql_context.createDataFrame(input_data_1, schema)
print input_dataframe.filter(input_dataframe.diagnosis_code == '11').count()
Both 'unicode' and 'str' are not working with Spark DataFrame. 'unicode'和'str'均不适用于Spark DataFrame。 I get the below TypeError: 我得到以下TypeError:
TypeError: StructType can not accept object in type TypeError: StructType can not accept object in type TypeError:StructType不能接受类型的对象TypeError:StructType不能接受类型的对象
I tried encoding in 'utf-8' as below but still get the error but now complaining about TypeError with 'str': 我尝试按如下所示在“ utf-8”中进行编码,但仍然收到错误,但现在抱怨带有“ str”的TypeError:
input_data_2 = input_data_1.map(lambda x: x.encode("utf-8"))
input_dataframe = sql_context.createDataFrame(input_data_2, schema)
print input_dataframe.filter(input_dataframe.diagnosis_code == '410.11').count()
I also tried parsing the CSV directly as utf-8 or unicode using the param use_unicode=True/False 我还尝试使用参数use_unicode = True / False将CSV直接解析为utf-8或Unicode
Reading between the lines. 字里行间的阅读。 You are 你是
reading data from a CSV file 从CSV文件读取数据
and get 并得到
TypeError: StructType can not accept object in type <type 'unicode'>
This happens because you pass a string not an object compatible with struct. 发生这种情况是因为您传递的字符串不是与struct兼容的对象。 Probably you pass data like: 可能您传递如下数据:
input_data_1 = sc.parallelize(["1,foo,2", "2,bar,3"])
and schema 和模式
schema = "x,y,z"
fields = [StructField(field_name, StringType(), True) for field_name in schema.split(',')]
schema = StructType(fields)
and you expect Spark to figure things out. 并且您希望Spark能够解决问题。 But it doesn't work that way. 但这不是那样的。 You could 你可以
input_dataframe = sqlContext.createDataFrame(input_data_1.map(lambda s: s.split(",")), schema)
but honestly just use Spark csv reader: 但老实说,只需使用Spark csv阅读器即可:
spark.read.schema(schema).csv("/path/to/file")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.