[英]How to handle NULL Value in BigQuery while writing through Dataflow?
I am ingesting data from one Database to BigQuery using the JdbcIO
Source connector and BigQueryIO
Sink connector provided by Apache Beam
.我正在使用
Apache Beam
提供的JdbcIO
Source 连接器和BigQueryIO
Sink 连接器将数据从一个数据库提取到 BigQuery。
Below is my sample table data:下面是我的示例表数据:
As we can see few columns such as id
, and booking_date
contain NULL Value.正如我们所见,
id
和booking_date
等几列包含 NULL 值。 So when I try to write data into BigQuery, it gives the below error因此,当我尝试将数据写入 BigQuery 时,它会出现以下错误
"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: Only optional fields can be set to NULL. Field: status; Value: NULL
if I pass null
in booking_date
it gives an invalid date format error
.如果我在
booking_date
中传递null
,它会给出一个invalid date format error
。
Below is the RowMapper I am using to convert JdbcIO
resultset into TableRow
.下面是我用来将
JdbcIO
结果集转换为TableRow
的 RowMapper 。 it is the same code that GCP JdbcToBigQuery
Template is using.它与GCP
JdbcToBigQuery
模板使用的代码相同。
public TableRow mapRow(ResultSet resultSet) throws Exception {
ResultSetMetaData metaData = resultSet.getMetaData();
TableRow outputTableRow = new TableRow();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
if (resultSet.getObject(i) == null) {
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
// outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
continue;
}
/*
* DATE: EPOCH MILLISECONDS -> yyyy-MM-dd
* DATETIME: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
* TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
*
* MySQL drivers have ColumnTypeName in all caps and postgres in small case
*/
switch (metaData.getColumnTypeName(i).toLowerCase()) {
case "date":
outputTableRow.set(
getColumnRef(metaData, i), dateFormatter.format(resultSet.getDate(i)));
break;
case "datetime":
outputTableRow.set(
getColumnRef(metaData, i),
datetimeFormatter.format((TemporalAccessor) resultSet.getObject(i)));
break;
case "timestamp":
outputTableRow.set(
getColumnRef(metaData, i), timestampFormatter.format(resultSet.getTimestamp(i)));
break;
case "clob":
Clob clobObject = resultSet.getClob(i);
if (clobObject.length() > Integer.MAX_VALUE) {
LOG.warn(
"The Clob value size {} in column {} exceeds 2GB and will be truncated.",
clobObject.length(),
getColumnRef(metaData, i));
}
outputTableRow.set(
getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
break;
default:
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
}
}
return outputTableRow;
}
Click here for more details JdbcToBigQuery单击此处了解更多详细信息JdbcToBigQuery
Solution I tried but did not get success解决方案我试过但没有成功
null
then it gives the error Missing required field
null
时,我试图跳过该特定列,然后它给出错误Missing required field
Could not convert value 'string_value: \t \"null\"' to integer
Could not convert value 'string_value: \t \"null\"' to integer
How can I handle all Null case?我如何处理所有 Null 案件? Please note, I will not be able to ignore these rows since few columns contain values.
请注意,我将无法忽略这些行,因为很少有列包含值。
To solve your issue, you have to pass null
if the date value is null
and you have to set the associated BigQuery
columns to NULLABLE
:要解决您的问题,如果日期值为
null
并且您必须将关联的BigQuery
列设置为NULLABLE
,则必须传递null
:
public TableRow mapRow(ResultSet resultSet) throws Exception {
ResultSetMetaData metaData = resultSet.getMetaData();
TableRow outputTableRow = new TableRow();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
if (resultSet.getObject(i) == null) {
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
// outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
continue;
}
/*
* DATE: EPOCH MILLISECONDS -> yyyy-MM-dd
* DATETIME: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
* TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
*
* MySQL drivers have ColumnTypeName in all caps and postgres in small case
*/
public void yourMethod() {
switch (metaData.getColumnTypeName(i).toLowerCase()) {
case "date":
String date = Optional.ofNullable(resultSet.getDate(i))
.map(d -> dateFormatter.format(d))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), date);
break;
case "datetime":
String datetime = Optional.ofNullable(resultSet.getObject(i))
.map(d -> datetimeFormatter.format((TemporalAccessor) d))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), datetime);
break;
case "timestamp":
String timestamp = Optional.ofNullable(resultSet.getTimestamp(i))
.map(t -> timestampFormatter.format(t))
.orElse(null);
outputTableRow.set(getColumnRef(metaData, i), timestamp);
break;
case "clob":
Clob clobObject = resultSet.getClob(i);
if (clobObject.length() > Integer.MAX_VALUE) {
LOG.warn(
"The Clob value size {} in column {} exceeds 2GB and will be truncated.",
clobObject.length(),
getColumnRef(metaData, i));
}
outputTableRow.set(
getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
break;
default:
outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
}
return outputTableRow;
}
For the date
, datetime
and timestamp blocs, I applied the transformation only if the value is not null, otherwise I retrieved default null value.对于
date
、 datetime
和 timestamp 块,我仅在值不是 null 时应用转换,否则我检索默认值 null。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.