如何從大查詢模式錯誤中獲取更好的日志

Question

我遇到了同樣的問題：讀取數據時出錯，錯誤消息：JSON 表遇到太多錯誤，放棄。 行，我很確定它與架構有關：

RuntimeError: BigQuery job beam_bq_job_LOAD_AUTOMATIC_JOB_NAME_LOAD_STEP_... failed. Error Result: <ErrorProto location: 'gs://dataflow/tmp/bq_load/some_file'
     message: 'Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details. File: gs://some_file'
     reason: 'invalid'> [while running 'WriteToBigQuery/BigQueryBatchFileLoads/WaitForDestinationLoadJobs-ptransform-27']

這里的問題是我有一個大模式（運行數據流作業）並且只是檢查它是否存在小錯誤是乏味的。 有什么方法可以查看更好的錯誤消息/獲取更多實際查明模式的哪一部分是錯誤的日志？

Answer 1

我經常遇到與Beam Python和BigQueryIO相同的問題，在這種情況下錯誤不明確，並且未指示架構中的錯誤字段。

為了解決這類問題，我通常在輸入元素中使用模式或 object 驗證，並為錯誤中的元素使用死信隊列。

然后我將錯誤匯入BigQuery表進行分析。

我創建了一個名為Asgarde的庫來簡化Beam的錯誤處理：

# Beam pipeline with Asgarde library.
input_teams: PCollection[str] = p | 'Read' >> beam.Create(team_names)

result = (CollectionComposer.of(input_teams)
            .map('Map with country', lambda tname: TeamInfo(name=tname, country=team_countries[tname], city=''))
            .map('Map with city', lambda tinfo: TeamInfo(name=tinfo.name, country=tinfo.country, city=team_cities[tinfo.name]))
            .filter('Filter french team', lambda tinfo: tinfo.country == 'France'))

result_outputs: PCollection[TeamInfo] = result.outputs
result_failures: PCollection[Failure] = result.failures

包裝器CollectionComposer從PCollection創建，此結構返回：

好的輸出PCollection
PCollection故障集

故障由Failure object 表示：

@dataclass
class Failure:
    pipeline_step: str
    input_element: str
    exception: Exception

您可以將Failure PCollection BigQuery表進行分析。

您還可以查看這篇文章Dead letter queue for errors with Beam, Asgarde, Dataflow and alerting in real time

我也分享給大家：

如何從大查詢模式錯誤中獲取更好的日志

問題描述

1 個解決方案

解決方案1
2 2023-01-19 14:02:26

如何從大查詢模式錯誤中獲取更好的日志

問題描述

1 個解決方案

解決方案1 2 2023-01-19 14:02:26

解決方案1
2 2023-01-19 14:02:26