简体   繁体   English

从 Pyspark 中嵌套的 Json-String 列中提取架构

[英]Extract Schema from nested Json-String column in Pyspark

Assuming I have the following table:假设我有下表:

body身体
{"Day":1,"vals":[{"id":"1", "val":"3"}], {"id":"2", "val":"4"}} {"Day":1,"vals":[{"id":"1", "val":"3"}], {"id":"2", "val":"4"}}

My goal is to write down the schema in Pyspark for this nested json column.我的目标是在 Pyspark 中为这个嵌套的 json 列写下架构。 I've tried the following two things:我尝试了以下两件事:

schema = StructType([
  StructField("Day", StringType()),
  StructField(
  "vals",
  StructType([
    StructType([
      StructField("id", StringType(), True),
      StructField("val", DoubleType(), True)
    ])
    StructType([
      StructField("id", StringType(), True),
      StructField("val", DoubleType(), True)
    ])
  ])
  )
])

Here I get the error that of在这里我得到的错误是

'StructType' object has no attribute 'name'

Another approach was to declare the nested Arrays as ArrayType:另一种方法是将嵌套数组声明为 ArrayType:

schema = StructType([
  StructField("Day", StringType()),
  StructField(
  "vals",
  ArrayType(
    ArrayType(
        StructField("id", StringType(), True),
        StructField("val", DoubleType(), True)
      , True)
    ArrayType(
        StructField("id", StringType(), True),
        StructField("val", DoubleType(), True)
      , True)
    , True)
  )
])

Here I get the following error:在这里,我收到以下错误:

takes from 2 to 3 positional arguments but 5 were given

Which propably comes from the array only taking the Sql type as an argument.这可能来自仅将 Sql 类型作为参数的数组。

Can anybody tell me what their approach would be to create the schema, since I'm a super newbie to the whole topic..谁能告诉我他们创建模式的方法是什么,因为我是整个主题的超级新手..

Your second nested StructType needs a name:您的第二个嵌套 StructType 需要一个名称:

schema = StructType([StructField("Day", DoubleType()), 
                 StructField("vals", StructType([StructField("id",StringType()), StructField("val", DoubleType())])),
                 StructField("vals2", StructType([StructField("id",StringType()), StructField("val", DoubleType())]))
                ])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM