繁体   English   中英

通过Dataflow写入时如何处理BigQuery中的NULL值?

[英]How to handle NULL Value in BigQuery while writing through Dataflow?

我正在使用Apache Beam提供的JdbcIO Source 连接器和BigQueryIO Sink 连接器将数据从一个数据库提取到 BigQuery。

下面是我的示例表数据:

在此处输入图像描述

正如我们所见, idbooking_date等几列包含 NULL 值。 因此,当我尝试将数据写入 BigQuery 时,它会出现以下错误

"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: Only optional fields can be set to NULL. Field: status; Value: NULL 

如果我在booking_date中传递null ,它会给出一个invalid date format error

下面是我用来将JdbcIO结果集转换为TableRow的 RowMapper 。 它与GCP JdbcToBigQuery模板使用的代码相同。

public TableRow mapRow(ResultSet resultSet) throws Exception {
  ResultSetMetaData metaData = resultSet.getMetaData();
  TableRow outputTableRow = new TableRow();
  for (int i = 1; i <= metaData.getColumnCount(); i++) {
    if (resultSet.getObject(i) == null) {
      outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
    // outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
      continue;
    }

/*
 * DATE:      EPOCH MILLISECONDS -> yyyy-MM-dd
 * DATETIME:  EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
 * TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
 *
 * MySQL drivers have ColumnTypeName in all caps and postgres in small case
 */
switch (metaData.getColumnTypeName(i).toLowerCase()) {
  case "date":
    outputTableRow.set(
        getColumnRef(metaData, i), dateFormatter.format(resultSet.getDate(i)));
    break;
  case "datetime":
    outputTableRow.set(
        getColumnRef(metaData, i),
        datetimeFormatter.format((TemporalAccessor) resultSet.getObject(i)));
    break;
  case "timestamp":
    outputTableRow.set(
        getColumnRef(metaData, i), timestampFormatter.format(resultSet.getTimestamp(i)));
    break;
  case "clob":
    Clob clobObject = resultSet.getClob(i);
    if (clobObject.length() > Integer.MAX_VALUE) {
      LOG.warn(
          "The Clob value size {} in column {} exceeds 2GB and will be truncated.",
          clobObject.length(),
          getColumnRef(metaData, i));
    }
    outputTableRow.set(
        getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
    break;
  default:
        outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
    }
  }

  return outputTableRow;
}

单击此处了解更多详细信息JdbcToBigQuery

解决方案我试过但没有成功

  • 当它是null时,我试图跳过该特定列,然后它给出错误Missing required field
  • 我尝试将所有情况的值硬编码为“null”,以便稍后处理这个特定值,但它给出了错误Could not convert value 'string_value: \t \"null\"' to integer

我如何处理所有 Null 案件? 请注意,我将无法忽略这些行,因为很少有列包含值。

要解决您的问题,如果日期值为null并且您必须将关联的BigQuery列设置为NULLABLE ,则必须传递null

public TableRow mapRow(ResultSet resultSet) throws Exception {
  ResultSetMetaData metaData = resultSet.getMetaData();
  TableRow outputTableRow = new TableRow();
  for (int i = 1; i <= metaData.getColumnCount(); i++) {
    if (resultSet.getObject(i) == null) {
      outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
    // outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
      continue;
    }

/*
 * DATE:      EPOCH MILLISECONDS -> yyyy-MM-dd
 * DATETIME:  EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
 * TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
 *
 * MySQL drivers have ColumnTypeName in all caps and postgres in small case
 */
public void yourMethod() {
    switch (metaData.getColumnTypeName(i).toLowerCase()) {
        case "date":
            String date = Optional.ofNullable(resultSet.getDate(i))
                    .map(d -> dateFormatter.format(d))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), date);
            break;
        case "datetime":
            String datetime = Optional.ofNullable(resultSet.getObject(i))
                    .map(d -> datetimeFormatter.format((TemporalAccessor) d))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), datetime);
            break;
        case "timestamp":
            String timestamp = Optional.ofNullable(resultSet.getTimestamp(i))
                    .map(t -> timestampFormatter.format(t))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), timestamp);
            break;
        case "clob":
            Clob clobObject = resultSet.getClob(i);
            if (clobObject.length() > Integer.MAX_VALUE) {
                LOG.warn(
                        "The Clob value size {} in column {} exceeds 2GB and will be truncated.",
                        clobObject.length(),
                        getColumnRef(metaData, i));
            }
            outputTableRow.set(
                    getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
            break;
        default:
            outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
    }
    return outputTableRow;
} 

对于datedatetime和 timestamp 块,我仅在值不是 null 时应用转换,否则我检索默认值 null。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM