[英]Is there a way to get a list of the files that were generated, from a large table, when exporting from BigQuery to GCS using a wildcard option?
I used the wildcard * export in order to export a large BigQuery table into separate files in GCS.我使用通配符 * export 将大型 BigQuery 表导出到 GCS 中的单独文件中。 I used the code sample provided in GCP's docs:
我使用了 GCP 文档中提供的代码示例:
from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'bucket'
project = "project"
dataset_id = "dataset"
table_id = "table"
destination_uri = "gs://{}/{}".format(bucket_name, "table*.parquet")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location="US",
) # API request
extract_job.result() # Waits for job to complete.
print(
"Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)
This generated 19 different files in my storage bucket like this mytable000000000000.parquet
and mytable000000000001.parquet
and so on (up to 0000000000019).这在我的存储桶中生成了 19 个不同的文件,例如
mytable000000000000.parquet
和mytable000000000001.parquet
等等(最多 0000000000019)。
It would be nice to have an automatic way to get a list of these file names so that I can either compose
them together or loop over them to do something else.最好有一种自动获取这些文件名列表的方法,这样我就可以将它们
compose
在一起或循环处理它们以执行其他操作。 Is there an easy way to edit the code above to do this?有没有一种简单的方法来编辑上面的代码来做到这一点?
You don't get an explicit list when using a wildcard, but take a look at the destinationUriFileCounts
field in the extract job statistics .使用通配符时您不会获得明确的列表,但请查看extract job statistics中的
destinationUriFileCounts
字段。 It would tell you how many files are present.它会告诉你有多少文件存在。 In python, this is available here .
在 python 中,可在此处获得。
If you want stronger validation, you could also leverage the Cloud Storage libraries and list objects with the same pattern(s) you supplied as part of the extract configuration.如果您想要更强的验证,您还可以利用 Cloud Storage 库并列出具有您作为提取配置的一部分提供的相同模式的对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.