简体   繁体   中英

How to insert a complex dynamic nested json into bigquery using java

I am using the JSON that is coming from the xAPI system and the JSON looks similar to the one available in the link into BigQuery where in the schema of the BigQuery is slightly altered.

Example: In the JSON for the child element verb

"verb":{  
    "id":"http://adlnet.gov/expapi/verbs/failed",
    "display":{  
      "en-US":"failed"
    }
  }

The schema is like below:

verb                        RECORD  NULLABLE    
verb.id                     STRING  NULLABLE    
verb.display                RECORD  REPEATED    
verb.display.stringValue    STRING  NULLABLE    
verb.display.languageCode   STRING  NULLABLE

If I use the jackson ObjectMapper , the JSON gets parsed but it doesn't get inserted to the BigQuery because in the JSON the display is only a record but in the BigQuery it is a list of records, so there seems to be a mismatch and I am unable to insert such custom fields.

Please provide any solutions to tackle this problem, in short whereever in the JSON from the link above provided there is "en", I am facing this issue.

Any help is appreciated.

ObjectMapper objectmapper = new ObjectMapper();  
objectmapper.configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false); 
objectmapper.setSerializationInclusion(Include.NON_NULL);
ashMap<String, Object> tempResult = objectmapper.readValue(stageJson, HashMap.class);

The display node is also a record in BigQuery. Here is how the schema must be set:

[
 {
   "name": "verb",
   "type": "RECORD",
   "mode": "NULLABLE",
   "fields": [
       {
         "name": "id",
         "type": "STRING",
         "mode": "NULLABLE"
       },
       {
         "name": "display",
         "type": "RECORD",
         "mode": "REPEATED",
         "fields": [
            {
              "name": "enUS",
              "type": "STRING",
              "mode": "NULLABLE"
            }
         ]
       }
    ]
 }
]

the problem with your verb example is the absence of square brackets containing the field; besides, there is another issue with the hyphen "-" in en-US : as field names can only contain letters, numbers, and underscores. it must be set to eg enUS .

here is the schema as displayed in the UI if your verb example (after enUs correction) is imported using schema auto-detection :

verb                RECORD  NULLABLE    
verb.display        RECORD  NULLABLE    
verb.display.enUS   STRING  NULLABLE    
verb.id             STRING  NULLABLE    

Types are correct, however display mode is detected as NULLABLE , because there are no [ ] .

as BigQuery JSON import format is newline delimited, then this import must be in this format:

{"verb":{"id":"http://adlnet.gov/expapi/verbs/failed","display":[{"enUS":"failed"}]}}

then display mode is detected as REPEATED .

once have your schema sorted and have a well-formed and valid JSON file, then simply use the BigQuery Java API to upload it, without the need of a complex flow and a third-party library to serialize JSON beforehand.

Source:

Specifying a Schema

Specifying nested and repeated columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM