使用通配符選項從 BigQuery 導出到 GCS 時，有沒有辦法從大表中獲取生成的文件列表？

Question

我使用通配符 * export 將大型 BigQuery 表導出到 GCS 中的單獨文件中。 我使用了 GCP 文檔中提供的代碼示例：

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'bucket'
project = "project"
dataset_id = "dataset"
table_id = "table"


destination_uri = "gs://{}/{}".format(bucket_name, "table*.parquet")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

這在我的存儲桶中生成了 19 個不同的文件，例如mytable000000000000.parquet和mytable000000000001.parquet等等（最多 0000000000019）。

最好有一種自動獲取這些文件名列表的方法，這樣我就可以將它們compose在一起或循環處理它們以執行其他操作。 有沒有一種簡單的方法來編輯上面的代碼來做到這一點？

Answer 1

使用通配符時您不會獲得明確的列表，但請查看extract job statistics中的destinationUriFileCounts字段。 它會告訴你有多少文件存在。 在 python 中，可在此處獲得。

如果您想要更強的驗證，您還可以利用 Cloud Storage 庫並列出具有您作為提取配置的一部分提供的相同模式的對象。

使用通配符選項從 BigQuery 導出到 GCS 時，有沒有辦法從大表中獲取生成的文件列表？

問題描述

1 個解決方案

解決方案1
2 2022-03-03 01:52:18

使用通配符選項從 BigQuery 導出到 GCS 時，有沒有辦法從大表中獲取生成的文件列表？

問題描述

1 個解決方案

解決方案1 2 2022-03-03 01:52:18

解決方案1
2 2022-03-03 01:52:18