简体   繁体   中英

How to handle NULL Value in BigQuery while writing through Dataflow?

I am ingesting data from one Database to BigQuery using the JdbcIO Source connector and BigQueryIO Sink connector provided by Apache Beam .

Below is my sample table data:

在此处输入图像描述

As we can see few columns such as id , and booking_date contain NULL Value. So when I try to write data into BigQuery, it gives the below error

"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: Only optional fields can be set to NULL. Field: status; Value: NULL 

if I pass null in booking_date it gives an invalid date format error .

Below is the RowMapper I am using to convert JdbcIO resultset into TableRow . it is the same code that GCP JdbcToBigQuery Template is using.

public TableRow mapRow(ResultSet resultSet) throws Exception {
  ResultSetMetaData metaData = resultSet.getMetaData();
  TableRow outputTableRow = new TableRow();
  for (int i = 1; i <= metaData.getColumnCount(); i++) {
    if (resultSet.getObject(i) == null) {
      outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
    // outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
      continue;
    }

/*
 * DATE:      EPOCH MILLISECONDS -> yyyy-MM-dd
 * DATETIME:  EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
 * TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
 *
 * MySQL drivers have ColumnTypeName in all caps and postgres in small case
 */
switch (metaData.getColumnTypeName(i).toLowerCase()) {
  case "date":
    outputTableRow.set(
        getColumnRef(metaData, i), dateFormatter.format(resultSet.getDate(i)));
    break;
  case "datetime":
    outputTableRow.set(
        getColumnRef(metaData, i),
        datetimeFormatter.format((TemporalAccessor) resultSet.getObject(i)));
    break;
  case "timestamp":
    outputTableRow.set(
        getColumnRef(metaData, i), timestampFormatter.format(resultSet.getTimestamp(i)));
    break;
  case "clob":
    Clob clobObject = resultSet.getClob(i);
    if (clobObject.length() > Integer.MAX_VALUE) {
      LOG.warn(
          "The Clob value size {} in column {} exceeds 2GB and will be truncated.",
          clobObject.length(),
          getColumnRef(metaData, i));
    }
    outputTableRow.set(
        getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
    break;
  default:
        outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
    }
  }

  return outputTableRow;
}

Click here for more details JdbcToBigQuery

Solution I tried but did not get success

  • I tried to skip that particular column when it is null then it gives the error Missing required field
  • I tried to hard code value for all cases as "null" so that I can handle this particular value later but it gives the error Could not convert value 'string_value: \t \"null\"' to integer

How can I handle all Null case? Please note, I will not be able to ignore these rows since few columns contain values.

To solve your issue, you have to pass null if the date value is null and you have to set the associated BigQuery columns to NULLABLE :

public TableRow mapRow(ResultSet resultSet) throws Exception {
  ResultSetMetaData metaData = resultSet.getMetaData();
  TableRow outputTableRow = new TableRow();
  for (int i = 1; i <= metaData.getColumnCount(); i++) {
    if (resultSet.getObject(i) == null) {
      outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i));
    // outputTableRow.set(getColumnRef(metaData, i), String.valueOf(resultSet.getObject(i)));
      continue;
    }

/*
 * DATE:      EPOCH MILLISECONDS -> yyyy-MM-dd
 * DATETIME:  EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSS
 * TIMESTAMP: EPOCH MILLISECONDS -> yyyy-MM-dd hh:mm:ss.SSSSSSXXX
 *
 * MySQL drivers have ColumnTypeName in all caps and postgres in small case
 */
public void yourMethod() {
    switch (metaData.getColumnTypeName(i).toLowerCase()) {
        case "date":
            String date = Optional.ofNullable(resultSet.getDate(i))
                    .map(d -> dateFormatter.format(d))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), date);
            break;
        case "datetime":
            String datetime = Optional.ofNullable(resultSet.getObject(i))
                    .map(d -> datetimeFormatter.format((TemporalAccessor) d))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), datetime);
            break;
        case "timestamp":
            String timestamp = Optional.ofNullable(resultSet.getTimestamp(i))
                    .map(t -> timestampFormatter.format(t))
                    .orElse(null);
            
            outputTableRow.set(getColumnRef(metaData, i), timestamp);
            break;
        case "clob":
            Clob clobObject = resultSet.getClob(i);
            if (clobObject.length() > Integer.MAX_VALUE) {
                LOG.warn(
                        "The Clob value size {} in column {} exceeds 2GB and will be truncated.",
                        clobObject.length(),
                        getColumnRef(metaData, i));
            }
            outputTableRow.set(
                    getColumnRef(metaData, i), clobObject.getSubString(1, (int) clobObject.length()));
            break;
        default:
            outputTableRow.set(getColumnRef(metaData, i), resultSet.getObject(i).toString());
    }
    return outputTableRow;
} 

For the date , datetime and timestamp blocs, I applied the transformation only if the value is not null, otherwise I retrieved default null value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM