[英]How to insert a complex dynamic nested json into bigquery using java
I am using the JSON that is coming from the xAPI system and the JSON looks similar to the one available in the link into BigQuery where in the schema of the BigQuery is slightly altered. 我正在使用来自xAPI系统的JSON,并且JSON看起来与BigQuery 链接中的JSON类似,其中BigQuery的架构略有更改。
Example: In the JSON for the child element verb
示例:在子元素
verb
的JSON中
"verb":{
"id":"http://adlnet.gov/expapi/verbs/failed",
"display":{
"en-US":"failed"
}
}
The schema is like below: 该架构如下所示:
verb RECORD NULLABLE
verb.id STRING NULLABLE
verb.display RECORD REPEATED
verb.display.stringValue STRING NULLABLE
verb.display.languageCode STRING NULLABLE
If I use the jackson ObjectMapper
, the JSON gets parsed but it doesn't get inserted to the BigQuery because in the JSON the display
is only a record but in the BigQuery it is a list of records, so there seems to be a mismatch and I am unable to insert such custom fields. 如果我使用杰克逊
ObjectMapper
,则将解析JSON,但不会将其插入到BigQuery中,因为在JSON中, display
仅是一条记录,而在BigQuery中则是记录列表,因此似乎不匹配,并且我无法插入此类自定义字段。
Please provide any solutions to tackle this problem, in short whereever in the JSON from the link above provided there is "en", I am facing this issue. 请提供任何解决此问题的解决方案,简而言之,只要上面链接中的JSON中带有“ en”的任何地方,我都面临此问题。
Any help is appreciated. 任何帮助表示赞赏。
ObjectMapper objectmapper = new ObjectMapper();
objectmapper.configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false);
objectmapper.setSerializationInclusion(Include.NON_NULL);
ashMap<String, Object> tempResult = objectmapper.readValue(stageJson, HashMap.class);
The display
node is also a record in BigQuery. display
节点也是BigQuery中的一条记录。 Here is how the schema must be set: 这是必须设置模式的方法:
[
{
"name": "verb",
"type": "RECORD",
"mode": "NULLABLE",
"fields": [
{
"name": "id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "display",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "enUS",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
}
]
the problem with your verb
example is the absence of square brackets containing the field; verb
示例的问题是缺少包含该字段的方括号; besides, there is another issue with the hyphen "-" in en-US
: as field names can only contain letters, numbers, and underscores. 此外,还有用连字符的另一个问题“ - ”中
en-US
:作为字段名只能包含字母,数字和下划线。 it must be set to eg enUS
. 必须将其设置为例如
enUS
。
here is the schema as displayed in the UI if your verb
example (after enUs
correction) is imported using schema auto-detection : 如果您的
verb
示例(在enUs
校正之后)是使用模式自动检测导入的,则以下是UI中显示的模式 :
verb RECORD NULLABLE
verb.display RECORD NULLABLE
verb.display.enUS STRING NULLABLE
verb.id STRING NULLABLE
Types are correct, however display
mode is detected as NULLABLE
, because there are no [ ]
. 类型是正确的,但是
display
模式被检测为NULLABLE
,因为没有[ ]
。
as BigQuery JSON import format is newline delimited, then this import must be in this format: 由于BigQuery JSON导入格式以换行符分隔,因此此导入必须采用以下格式:
{"verb":{"id":"http://adlnet.gov/expapi/verbs/failed","display":[{"enUS":"failed"}]}}
then display
mode is detected as REPEATED
. 然后
display
模式被检测为REPEATED
。
once have your schema sorted and have a well-formed and valid JSON file, then simply use the BigQuery Java API to upload it, without the need of a complex flow and a third-party library to serialize JSON beforehand. 一旦对您的架构进行了排序并拥有格式正确且有效的JSON文件,则只需使用BigQuery Java API上载它,而无需复杂的流程和第三方库即可预先序列化JSON。
Source: 资源:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.