[英]Write repeated Strings to BigQuery using Apache Beam
I have a data stream containing Strings
which look like JSONArrays
.我有一个数据 stream 包含看起来像
JSONArrays
的Strings
。 I want to parse those Strings and write to BigQuery table using Apache Beam but am getting an error while writing repeated Strings.我想解析这些字符串并使用 Apache Beam 写入 BigQuery 表,但在写入重复字符串时出现错误。
Here´s how I convert my string to TableRow
:这是我将字符串转换为
TableRow
的方法:
String dataString = "[{\"EMAIL\": [\"zog@yahoo.com\"]}]";
JSONArray jsonArray = new JSONArray(dataString);
TableRow tableRow = new TableRow();
for (int i = 0; i < jsonArray.length(); i++) {
JSONArray emailArray = new JSONArray(jsonArray.getJSONObject(i).get("EMAIL").toString());
tableRow.set("EMAIL", emailArray); //Results in error
}
Here´s what my BigQuery schema looks like:这是我的 BigQuery 架构的样子:
[
{
"name": "EMAIL",
"type": "STRING",
"mode": "REPEATED"
}
]
I have managed to write a similar repeated String to BigQuery table using Python but unable to do it using Apache Beam.我已经设法使用 Python 将类似的重复字符串写入 BigQuery 表,但使用 Apache Beam 无法做到这一点。 I suppose I am not saving the right key-value pair in
TableRow
.我想我没有在
TableRow
中保存正确的键值对。 The error I am getting now is:我现在得到的错误是:
java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"email","message":"This field is not a record.","reason":"invalid"}],"index":0}]
I need help regarding how to save a similar repeated String to BigQuery without creating a record and would appreciate any advice or suggestions.我需要有关如何在不创建记录的情况下将类似的重复字符串保存到 BigQuery 的帮助,并且希望获得任何建议或建议。 Thanks in advance.
提前致谢。
It seems you want to create看来你想创建
Note that is seems your ValidFrom
field is of type STRING
, not a repeated field, unless it is wrapped in a repeated field in a hierarchical schema.请注意,您的
ValidFrom
字段似乎是STRING
类型,而不是重复字段,除非它包含在分层架构中的重复字段中。
In the example code you provided, you are creating a JSONArray
and putting it into the STRING
field, which I think cause issues as the types are incompatible.在您提供的示例代码中,您正在创建一个
JSONArray
并将其放入STRING
字段,我认为这会导致问题,因为类型不兼容。 If you want to keep it as a plain STRING
field, you can use Solution 1 below.如果要将其保留为纯
STRING
字段,可以使用下面的解决方案 1。
Also make sure that the name of your column in BigQuery matches the one in your code, I see you use both ValidFrom
and EMAIL
(might be a mistake in your posted code though).还要确保 BigQuery 中的列名称与代码中的名称相匹配,我看到您同时使用
ValidFrom
和EMAIL
(尽管您发布的代码中可能有错误)。
In case you want to add one row with a concatenated String
field in BigQuery, you can use the following:如果您想在 BigQuery 中添加一行具有串联
String
字段的行,可以使用以下命令:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build a concatenated string of emails
String allEmails = String.join(";", emails);
// Add the string field to the row
tableRow.set('EMAILS', allEmails);
In case you want to insert multiple rows , you your create multiple table rows:如果您想插入多行,您可以创建多个表行:
// Find email addresses
String [] emails = ... // your extraction logic
// Build a row per email
for(String email: emails) {
// Initialize your final row
TableRow tableRow = new TableRow();
tableRow.set('EMAIL', email);
// TODO: do something with the row (add to list, or ...)
}
In case you want to add one row with a REPEATED STRING
field in BigQuery, you can use the following:如果您想在 BigQuery 中添加一行带有
REPEATED STRING
字段的行,您可以使用以下命令:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build the repeated field
List<String> emailCells = new ArrayList<>();
for(String email: emails) {
emailCells.add(email);
}
// Add the repeated field to the row
tableRow.set('EMAILS', emailCells);
If this is not what you're aiming for, please provide some more details.如果这不是您的目标,请提供更多详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.