简体   繁体   English

如何使用Java将复杂的动态嵌套json插入bigquery

[英]How to insert a complex dynamic nested json into bigquery using java

I am using the JSON that is coming from the xAPI system and the JSON looks similar to the one available in the link into BigQuery where in the schema of the BigQuery is slightly altered. 我正在使用来自xAPI系统的JSON,并且JSON看起来与BigQuery 链接中的JSON类似,其中BigQuery的架构略有更改。

Example: In the JSON for the child element verb 示例:在子元素verb的JSON中

"verb":{  
    "id":"http://adlnet.gov/expapi/verbs/failed",
    "display":{  
      "en-US":"failed"
    }
  }

The schema is like below: 该架构如下所示:

verb                        RECORD  NULLABLE    
verb.id                     STRING  NULLABLE    
verb.display                RECORD  REPEATED    
verb.display.stringValue    STRING  NULLABLE    
verb.display.languageCode   STRING  NULLABLE

If I use the jackson ObjectMapper , the JSON gets parsed but it doesn't get inserted to the BigQuery because in the JSON the display is only a record but in the BigQuery it is a list of records, so there seems to be a mismatch and I am unable to insert such custom fields. 如果我使用杰克逊ObjectMapper ,则将解析JSON,但不会将其插入到BigQuery中,因为在JSON中, display仅是一条记录,而在BigQuery中则是记录列表,因此似乎不匹配,并且我无法插入此类自定义字段。

Please provide any solutions to tackle this problem, in short whereever in the JSON from the link above provided there is "en", I am facing this issue. 请提供任何解决此问题的解决方案,简而言之,只要上面链接中的JSON中带有“ en”的任何地方,我都面临此问题。

Any help is appreciated. 任何帮助表示赞赏。

ObjectMapper objectmapper = new ObjectMapper();  
objectmapper.configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false); 
objectmapper.setSerializationInclusion(Include.NON_NULL);
ashMap<String, Object> tempResult = objectmapper.readValue(stageJson, HashMap.class);

The display node is also a record in BigQuery. display节点也是BigQuery中的一条记录。 Here is how the schema must be set: 这是必须设置模式的方法:

[
 {
   "name": "verb",
   "type": "RECORD",
   "mode": "NULLABLE",
   "fields": [
       {
         "name": "id",
         "type": "STRING",
         "mode": "NULLABLE"
       },
       {
         "name": "display",
         "type": "RECORD",
         "mode": "REPEATED",
         "fields": [
            {
              "name": "enUS",
              "type": "STRING",
              "mode": "NULLABLE"
            }
         ]
       }
    ]
 }
]

the problem with your verb example is the absence of square brackets containing the field; verb示例的问题是缺少包含该字段的方括号; besides, there is another issue with the hyphen "-" in en-US : as field names can only contain letters, numbers, and underscores. 此外,还有用连字符的另一个问题“ - ”中en-US :作为字段名只能包含字母,数字和下划线。 it must be set to eg enUS . 必须将其设置为例如enUS

here is the schema as displayed in the UI if your verb example (after enUs correction) is imported using schema auto-detection : 如果您的verb示例(在enUs校正之后)是使用模式自动检测导入的,则以下是UI中显示的模式

verb                RECORD  NULLABLE    
verb.display        RECORD  NULLABLE    
verb.display.enUS   STRING  NULLABLE    
verb.id             STRING  NULLABLE    

Types are correct, however display mode is detected as NULLABLE , because there are no [ ] . 类型是正确的,但是display模式被检测为NULLABLE ,因为没有[ ]

as BigQuery JSON import format is newline delimited, then this import must be in this format: 由于BigQuery JSON导入格式以换行符分隔,因此此导入必须采用以下格式:

{"verb":{"id":"http://adlnet.gov/expapi/verbs/failed","display":[{"enUS":"failed"}]}}

then display mode is detected as REPEATED . 然后display模式被检测为REPEATED

once have your schema sorted and have a well-formed and valid JSON file, then simply use the BigQuery Java API to upload it, without the need of a complex flow and a third-party library to serialize JSON beforehand. 一旦对您的架构进行了排序并拥有格式正确且有效的JSON文件,则只需使用BigQuery Java API上载它,而无需复杂的流程和第三方库即可预先序列化JSON。

Source: 资源:

Specifying a Schema 指定架构

Specifying nested and repeated columns 指定嵌套和重复的列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM