[英]How to handle NULL Value in BigQuery while writing through Dataflow?
我正在使用Apache Beam
提供的JdbcIO
Source 连接器和BigQueryIO
Sink 连接器将数据从一个数据库提取到 BigQuery。
下面是我的示例表数据:
正如我们所见, id
和booking_date
等几列包含 NULL 值。 因此,当我尝试将数据写入 BigQuery 时,它会出现以下错误
"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: Only optional fields can be set to NULL. Field: status; Value: NULL
如果我在booking_date
中传递null
,它会给出一个invalid date format error
。
下面是我用来将JdbcIO
结果集转换为TableRow
的 RowMapper 。 它与GCP JdbcToBigQuery
模板使用的代码相同。
public TableRow mapRow(ResultSet resultSet) throws Exception {
ResultSetMetaData metaData = resultSet.getMetaData();
TableRow outputTableRow = new TableRow();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
if (resultSet.getObject(i) == null) {
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
// outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
continue;
}
/*
* DATE: EPOCH MILLISECONDS -> yyyy-MM-dd
* DATETIME: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
* TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
*
* MySQL drivers have ColumnTypeName in all caps and postgres in small case
*/
switch (metaData.getColumnTypeName(i).toLowerCase()) {
case "date":
outputTableRow.set(
getColumnRef(metaData, i), dateFormatter.format(resultSet.getDate(i)));
break;
case "datetime":
outputTableRow.set(
getColumnRef(metaData, i),
datetimeFormatter.format((TemporalAccessor) resultSet.getObject(i)));
break;
case "timestamp":
outputTableRow.set(
getColumnRef(metaData, i), timestampFormatter.format(resultSet.getTimestamp(i)));
break;
case "clob":
Clob clobObject = resultSet.getClob(i);
if (clobObject.length() > Integer.MAX_VALUE) {
LOG.warn(
"The Clob value size {} in column {} exceeds 2GB and will be truncated.",
clobObject.length(),
getColumnRef(metaData, i));
}
outputTableRow.set(
getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
break;
default:
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
}
}
return outputTableRow;
}
单击此处了解更多详细信息JdbcToBigQuery
解决方案我试过但没有成功
null
时,我试图跳过该特定列,然后它给出错误Missing required field
Could not convert value 'string_value: \t \"null\"' to integer
我如何处理所有 Null 案件? 请注意,我将无法忽略这些行,因为很少有列包含值。
要解决您的问题,如果日期值为null
并且您必须将关联的BigQuery
列设置为NULLABLE
,则必须传递null
:
public TableRow mapRow(ResultSet resultSet) throws Exception {
ResultSetMetaData metaData = resultSet.getMetaData();
TableRow outputTableRow = new TableRow();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
if (resultSet.getObject(i) == null) {
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
// outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
continue;
}
/*
* DATE: EPOCH MILLISECONDS -> yyyy-MM-dd
* DATETIME: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
* TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
*
* MySQL drivers have ColumnTypeName in all caps and postgres in small case
*/
public void yourMethod() {
switch (metaData.getColumnTypeName(i).toLowerCase()) {
case "date":
String date = Optional.ofNullable(resultSet.getDate(i))
.map(d -> dateFormatter.format(d))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), date);
break;
case "datetime":
String datetime = Optional.ofNullable(resultSet.getObject(i))
.map(d -> datetimeFormatter.format((TemporalAccessor) d))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), datetime);
break;
case "timestamp":
String timestamp = Optional.ofNullable(resultSet.getTimestamp(i))
.map(t -> timestampFormatter.format(t))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), timestamp);
break;
case "clob":
Clob clobObject = resultSet.getClob(i);
if (clobObject.length() > Integer.MAX_VALUE) {
LOG.warn(
"The Clob value size {} in column {} exceeds 2GB and will be truncated.",
clobObject.length(),
getColumnRef(metaData, i));
}
outputTableRow.set(
getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
break;
default:
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
}
return outputTableRow;
}
对于date
、 datetime
和 timestamp 块,我仅在值不是 null 时应用转换,否则我检索默认值 null。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.