[英]Pyspark 'from_json', data frame return null for the all json values
我有以下日志,其中包含文本和 json 字符串
2020-09-24T08:03:01.633Z 11.21.23.1 {"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}
為上述日志創建了 DF,如下所示
| Date |Source IP | Event Type
|2020-09-24|11.21.23.1 | {"EventTime":"202|
用於將 json 字符串轉換為另一個數據框的包裝模式
json_schema = StructType([
StructField("EventTime", StringType()),
StructField("Hostname", StringType()),
StructField("Keywords", IntegerType())
])
json_converted_df= df.select(F.from_json(F.col('Event Type'), json_schema).alias("data")).select("data.*").show()
但是對於所有新的 json 模式,數據框重新運行 null
+---------+--------+--------
|EventTime|Hostname|Keywords|
+---------+--------+--------
| null| null|null |
+---------+--------+--------
如何解決這個問題?
對我很好...
# Preparation of test dataset
a = [
(
"2020-09-24T08:03:01.633Z",
"11.21.23.1",
'{"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}',
),
]
b = ["Date", "Source IP", "Event Type"]
df = spark.createDataFrame(a, b)
df.show()
#+--------------------+----------+--------------------+
#| Date| Source IP| Event Type|
#+--------------------+----------+--------------------+
#|2020-09-24T08:03:...|11.21.23.1|{"EventTime":"202...|
#+--------------------+----------+--------------------+
df.printSchema()
#root
# |-- Date: string (nullable = true)
# |-- Source IP: string (nullable = true)
# |-- Event Type: string (nullable = true)
# Your code executed
from pyspark.sql.types import *
json_schema = StructType(
[
StructField("EventTime", StringType()),
StructField("Hostname", StringType()),
StructField("Keywords", IntegerType()),
]
)
json_converted_df = df.select(
F.from_json(F.col("Event Type"), json_schema).alias("data")
).select("data.*")
json_converted_df.show()
#+-------------------+-------------------+--------+
#| EventTime| Hostname|Keywords|
#+-------------------+-------------------+--------+
#|2020-09-24 13:33:01|abc-cde.india.local| -1234|
#+-------------------+-------------------+--------+
json_converted_df.printSchema()
#root
# |-- EventTime: string (nullable = true)
# |-- Hostname: string (nullable = true)
# |-- Keywords: integer (nullable = true)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.