简体   繁体   中英

Writing failed row inserts in a streaming job to bigquery using apache beam JAVA SDK?

While running a streaming job its always good to have logs of rows which were not processed while inserting into big query. Catching and write those into another big query table will give an idea for what went wrong.

Below are the steps that you can try to achieve the same.

Pre-requisites:

  • apache-beam >= 2.10.0 or latest

Using the getFailedInsertsWithErr() function available in the sdk you can easily catch the failed inserts and push to another table for performing RCA. This becomes an important feature for debugging streaming pipelines which are running infinitely.

BigQueryInsertError is an error function that is thrown back by big query for a failed TableRow. This will contain the following parameters

  • Row.
  • Error stacktrace and error message payload.
  • Table reference object.

The above parameters can be captured and pushed into another bq table. Example schema for error records.

    "fields": [{
            "name": "timestamp",
            "type": "TIMESTAMP",
            "mode": "REQUIRED"
        },
        {
            "name": "payloadString",
            "type": "STRING",
            "mode": "REQUIRED"
        },
        {
            "name": "errorMessage",
            "type": "STRING",
            "mode": "NULLABLE"
        },
        {
            "name": "stacktrace",
            "type": "STRING",
            "mode": "NULLABLE"
        }
    ]
}


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM