使用通配符选项从 BigQuery 导出到 GCS 时，有没有办法从大表中获取生成的文件列表？

Question

I used the wildcard * export in order to export a large BigQuery table into separate files in GCS.我使用通配符 * export 将大型 BigQuery 表导出到 GCS 中的单独文件中。 I used the code sample provided in GCP's docs:我使用了 GCP 文档中提供的代码示例：

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'bucket'
project = "project"
dataset_id = "dataset"
table_id = "table"


destination_uri = "gs://{}/{}".format(bucket_name, "table*.parquet")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

This generated 19 different files in my storage bucket like this mytable000000000000.parquet and mytable000000000001.parquet and so on (up to 0000000000019).这在我的存储桶中生成了 19 个不同的文件，例如mytable000000000000.parquet和mytable000000000001.parquet等等（最多 0000000000019）。

It would be nice to have an automatic way to get a list of these file names so that I can either compose them together or loop over them to do something else.最好有一种自动获取这些文件名列表的方法，这样我就可以将它们compose在一起或循环处理它们以执行其他操作。 Is there an easy way to edit the code above to do this?有没有一种简单的方法来编辑上面的代码来做到这一点？

Answer 1

You don't get an explicit list when using a wildcard, but take a look at the destinationUriFileCounts field in the extract job statistics .使用通配符时您不会获得明确的列表，但请查看extract job statistics中的destinationUriFileCounts字段。 It would tell you how many files are present.它会告诉你有多少文件存在。 In python, this is available here .在 python 中，可在此处获得。

If you want stronger validation, you could also leverage the Cloud Storage libraries and list objects with the same pattern(s) you supplied as part of the extract configuration.如果您想要更强的验证，您还可以利用 Cloud Storage 库并列出具有您作为提取配置的一部分提供的相同模式的对象。

使用通配符选项从 BigQuery 导出到 GCS 时，有没有办法从大表中获取生成的文件列表？

问题描述

1 个解决方案

解决方案1
2 2022-03-03 01:52:18

使用通配符选项从 BigQuery 导出到 GCS 时，有没有办法从大表中获取生成的文件列表？

问题描述

1 个解决方案

解决方案1 2 2022-03-03 01:52:18

解决方案1
2 2022-03-03 01:52:18