簡體   English   中英

Pyspark 'from_json',數據框為所有 json 值返回 null

[英]Pyspark 'from_json', data frame return null for the all json values

我有以下日志,其中包含文本和 json 字符串

2020-09-24T08:03:01.633Z 11.21.23.1 {"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}

為上述日志創建了 DF,如下所示


| Date     |Source IP  | Event Type
|2020-09-24|11.21.23.1 | {"EventTime":"202|

用於將 json 字符串轉換為另一個數據框的包裝模式

json_schema = StructType([
        StructField("EventTime", StringType()),
        StructField("Hostname", StringType()),
        StructField("Keywords", IntegerType())
    ])

json_converted_df= df.select(F.from_json(F.col('Event Type'), json_schema).alias("data")).select("data.*").show()

但是對於所有新的 json 模式,數據框重新運行 null

+---------+--------+--------
|EventTime|Hostname|Keywords|
+---------+--------+--------
|     null|    null|null    |
+---------+--------+--------

如何解決這個問題?

對我很好...

# Preparation of test dataset

a = [
    (
        "2020-09-24T08:03:01.633Z",
        "11.21.23.1",
        '{"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}',
    ),
]

b = ["Date", "Source IP", "Event Type"]

df = spark.createDataFrame(a, b)

df.show()
#+--------------------+----------+--------------------+
#|                Date| Source IP|          Event Type|
#+--------------------+----------+--------------------+
#|2020-09-24T08:03:...|11.21.23.1|{"EventTime":"202...|
#+--------------------+----------+--------------------+

df.printSchema()
#root
# |-- Date: string (nullable = true)
# |-- Source IP: string (nullable = true)
# |-- Event Type: string (nullable = true)
# Your code executed
from pyspark.sql.types import *

json_schema = StructType(
    [
        StructField("EventTime", StringType()),
        StructField("Hostname", StringType()),
        StructField("Keywords", IntegerType()),
    ]
)

json_converted_df = df.select(
    F.from_json(F.col("Event Type"), json_schema).alias("data")
).select("data.*")

json_converted_df.show()
#+-------------------+-------------------+--------+
#|          EventTime|           Hostname|Keywords|
#+-------------------+-------------------+--------+
#|2020-09-24 13:33:01|abc-cde.india.local|   -1234|
#+-------------------+-------------------+--------+

json_converted_df.printSchema()
#root
# |-- EventTime: string (nullable = true)
# |-- Hostname: string (nullable = true)
# |-- Keywords: integer (nullable = true)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM