[英]TypeError: field col1: LongType can not accept object '' in type <class 'str'>
I have json in python like this:我在 python 中有这样的 json:
example = [{"col1":"","col2":"","col3":52272}, ...]
Columns of json might be null. json 的列可能为空。 Empty value is "".空值为“”。
I created the spark schema:我创建了火花模式:
schema = StructType([
StructField("col1", LongType(), True),
StructField("col2", LongType(), True),
StructField("col3", LongType(), True),]
I try to get the spark dataframe like this:我尝试像这样获取火花数据框:
pandas_df = pd.DataFrame(example)
spark_df = spark.createDataFrame(pandas_df, schema = schema)
But I get that error:但我得到了那个错误:
TypeError: field col1: LongType can not accept object '' in type <class 'str'>
What fix the error?什么修复错误? Same error happens if I used other types of this column.如果我使用此列的其他类型,也会发生同样的错误。
As @tdelaney commented, your schema doesn't reflect your data.正如@tdelaney 评论的那样,您的架构并未反映您的数据。
You could try something like this:你可以尝试这样的事情:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType, StringType, StructField, StructType
if __name__ == "__main__":
spark = SparkSession.builder.master("local").appName("Test").getOrCreate()
data = [{"col1": "", "col2": "", "col3": 52272}]
schema = StructType(
[
StructField("col1", StringType(), True),
StructField("col2", StringType(), True),
StructField("col3", IntegerType(), True),
]
)
df = spark.createDataFrame(data=data, schema=schema)
Which gives:这使:
+----+----+-----+
|col1|col2|col3 |
+----+----+-----+
| | |52272|
+----+----+-----+
If, for example, you want to replace empty strings with None
you could use:例如,如果你想用None
替换空字符串,你可以使用:
df = df.withColumn("col2", F.when(F.col("col2") != "", F.col("col2")).otherwise(None))
Which gives:这使:
+----+----+-----+
|col1|col2|col3 |
+----+----+-----+
| |null|52272|
+----+----+-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.