[英]extract multiple columns from a json string
I have a JSON data that I want to represent in a tabular form and later write it to a different format (parquet)我有一个 JSON 数据,我想以表格形式表示,然后将其写入不同的格式(镶木地板)
Schema图式
root
|-- : string (nullable = true)
sample data样本数据
+----------------------------------------------+
+----------------------------------------------+
|{"deviceTypeId":"A2A","deviceId":"123","geo...|
|{"deviceTypeId":"A2B","deviceId":"456","geo...|
+----------------------------------------------+
Expected Output预计 Output
+--------------+------------+
| deviceTypeId|deviceId|...|
+--------------+--------+---+
| A2A| 123| |
| A2B| 456| |
+--------------+--------+---+
I tried splitting the string, but this doesn't seem like an efficient approach我尝试拆分字符串,但这似乎不是一种有效的方法
split_col = split(df_explode[''], ',')
And then extract the columns, but it appends the initial string as well.然后提取列,但它也会附加初始字符串。
df_1 = df_explode.withColumn('deviceId',split_col.getItem(1))
# df_1 = df_explode.withColumn('deviceTypeId',split_col.getItem(0))
printOutput(df_1)
I'm looking for a better way to solve this problem我正在寻找更好的方法来解决这个问题
Explode function is only to work on Array. Explode function 仅适用于 Array。
In your case which is a json, you should use from_json function.在您的情况下是 json,您应该使用 from_json function。
Please refer from_json from pyspark.sql.functions请从 pyspark.sql.functions 参考 from_json
I was able to do it using the from_json function.我能够使用from_json function 来做到这一点。
#Convert json column to multiple columns
schema = getSchema()
dfJSON = df_explode.withColumn("jsonData",from_json(col(''),schema)) \
.select("jsonData.*")
dfJSON.printSchema()
dfJSON.limit(100).toPandas()
We need to create Json Schema that will parse the Json data.我们需要创建 Json 模式来解析 Json 数据。
def getSchema():
schema = StructType([
StructField('deviceTypeId', StringType()),
StructField('deviceId', StringType()),
...
])
return schema
The value string is empty in this Json data so the col consists of empty string此 Json 数据中的值字符串为空,因此 col 由空字符串组成
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.