简体   繁体   English

如何查找由于 apache beam java sdk 中的错误而被拒绝的文件

[英]How to find rejected files due to errors in apache beam java sdk

I Have N number of same type files to be processed and I will be giving a wildcard input pattern( C:\\users\\*\\* ).我有N个相同类型的文件要处理,我将给出一个通配符输入模式( C:\\users\\*\\* )。 So now how do I find the file name and record ,that has been rejected while uploading to bigquery in java.那么现在我如何找到在java中上传到bigquery时被拒绝的文件名和记录。

I guess BQ writes to the temp location path that you pass to your pipeline and not to local [honestly not sure about this].我猜 BQ 会写入您传递给管道而不是本地的临时位置路径 [老实说,对此不确定]。

In my case, with python, I used to pass tmp location as GCS bucket, and when I error is show, they usually shows the name of the log file that contains the rejected errors in the command line logs.在我的情况下,使用 python,我曾经将 tmp 位置作为 GCS 存储桶传递,当我显示错误时,它们通常会在命令行日志中显示包含被拒绝错误的日志文件的名称。

And then I use gsutil cp command to copy it to my local computer and read it然后我使用gsutil cp命令将它复制到我的本地计算机并读取它

BigQuery I/O (Java and Python SDK) supports deadletter pattern: https://beam.apache.org/documentation/patterns/bigqueryio/ . BigQuery I/O(Java 和 Python SDK)支持死信模式: https ://beam.apache.org/documentation/patterns/bigqueryio/。

Java爪哇

result
      .getFailedInsertsWithErr()
      .apply(
          MapElements.into(TypeDescriptors.strings())
              .via(
                  x -> {
                    System.out.println(" The table was " + x.getTable());
                    System.out.println(" The row was " + x.getRow());
                    System.out.println(" The error was " + x.getError());
                    return "";
                  }));

Python Python

errors = (
  result['FailedRows']
  | 'PrintErrors' >>
  beam.FlatMap(lambda err: print("Error Found {}".format(err))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用apache beam JAVA SDK在流作业中将失败的行插入写入bigquery? - Writing failed row inserts in a streaming job to bigquery using apache beam JAVA SDK? Apache Beam Python SDK 中是否有等效的 withFormatFunction? - Is there withFormatFunction equivalent in Apache Beam Python SDK? Apache Beam Java(SDK 版本 2.43.0 - 2.44.0)批量加载到 BigQuery 失败使用存储写入 API - Apache Beam Java (SDK versions 2.43.0 - 2.44.0) batch loads to BigQuery fail using Storage Write API Kafka Avro 到 BigQuery 使用 Java 中的 Apache Beam - Kafka Avro To BigQuery using Apache Beam in Java 从 Apache Beam Go SDK 写入 BigQuery 时类型无效 - Invalid type while writing to BigQuery from Apache Beam Go SDK Apache Beam - 如何将转换后的记录与原始记录相关联? - Apache Beam - How to associate transformed record with original? 如何使用 Apache Beam 中的 BigQuery IO 写入 BigQuery? - How to write to BigQuery with BigQuery IO in Apache Beam? 如何在apache beam dataflow中将csv转换为字典 - How to convert csv into a dictionary in apache beam dataflow 如何在 Apache Beam / Cloud Dataflow 中实现回顾 - How to implement a lookback in Apache Beam / Cloud Dataflow 如何使用Apache BEAM在BigQuery中执行快速联接 - How to perform a fast join in BigQuery with Apache BEAM
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM