简体   繁体   English

如何在BigQuery中获取文件加载插入失败的插入记录

[英]How to get failed insert record for file load insertion in BigQuery

I'm using Apache Beam (Java SDK) to insert record in BigQuery using Batch load method (File loads).我正在使用 Apache Beam(Java SDK)使用批量加载方法(文件加载)在 BigQuery 中插入记录。 I want to retrieve those records which failed during insertion.我想检索那些在插入过程中失败的记录。

Is it possible to have a retry policy on failed records?是否可以对失败的记录制定重试策略?

Below is my code:下面是我的代码:

public static void insertToBigQueryDataLake(
        final PCollectionTuple dataStoresCollectionTuple,
        final TupleTag<KV<DataLake, PayloadSpecs>> dataLakeValidTag,
        final Long loadJobTriggerFrequency,
        final Integer loadJobNumShard) {


    WriteResult writeResult = dataStoresCollectionTuple
            .get(dataLakeValidTag)
            .apply(TRANSFORMATION_NAME, DataLakeTableProcessor.dataLakeTableProcessorTransform())
            .apply(
                    WRITING_EVENTS_NAME,
                    BigQueryIO.<KV<DataLake, TableRowSpecs>>write()
                            .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
                            .withTriggeringFrequency(Duration.standardMinutes(loadJobTriggerFrequency))
                            .withNumFileShards(loadJobNumShard)
                            .to(new DynamicTableRowDestinations<>(IS_DATA_LAKE))
                            .withFormatFunction(BigQueryServiceImpl::dataLakeTableRow));

    writeResult.getFailedInserts().apply(ParDo.of(new DoFn<TableRow, Void>() {
        @ProcessElement
        public void processElement(final ProcessContext processContext) throws IOException {
            System.out.println("Table Row : " + processContext.element().toPrettyString());
        }
    }));

}

Using the getFailedInsertsWithErr() method we can push the failed inserts to another table for performing Root cause analysis(RCA), check here for more details.使用 getFailedInsertsWithErr() 方法,我们可以将失败的插入推送到另一个表以执行根本原因分析 (RCA),请在此处查看更多详细信息。

Example:
// write failed rows with their error to error table                
writeResult
        .getFailedInsertsWithErr()
        .apply(Window.into(FixedWindows.of(Duration.standardMinutes(5))))
        .apply("BQ-insert-error-extract", ParDo.of(new BigQueryInsertErrorExtractFn(tableRowToInsertView)).withSideInputs(tableRowToInsertView))
        .apply("BQ-insert-error-write", BigQueryIO.writeTableRows()
                .to(errTableSpec)
                .withJsonSchema(errSchema)
                .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM