[英]Download multiple file from Google cloud storage using Python
I am trying to download multiple files from the Google cloud storage folder. 我正在尝试从Google云存储文件夹下载多个文件。 I am able to download the single file but unable to download multiple files.
我可以下载单个文件,但无法下载多个文件。 I took this reference from this link but seems it is not working.
我从这个链接中获取了这个参考,但似乎它不起作用。 The code is as follow:
代码如下:
# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"
# Create this folder locally
if not os.path.exists(folder):
os.makedirs(folder)
# Retrieve all blobs with a prefix matching the folder
bucket=storage_client.get_bucket(bucket_name)
print(bucket)
blobs=list(bucket.list_blobs(prefix=folder))
print(blobs)
for blob in blobs:
if(not blob.name.endswith("/")):
blob.download_to_filename(blob.name)
# [End download to multiple files]
Is there any way to download multiple files matching with the pattern(name) or something else. 有没有办法下载与模式(名称)或其他东西匹配的多个文件。 Since I am exporting the file from bigquery, the file names will be something like below:
由于我从bigquery导出文件,文件名将如下所示:
shakespeare-000000000000.csv.gz
shakespeare-000000000001.csv.gz
shakespeare-000000000002.csv.gz
shakespeare-000000000003.csv.gz
Reference: Working code to download single file: 参考:下载单个文件的工作代码:
# [download to single files]
edgenode_destination_uri = '/projects/bigquery/download/shakespeare-000000000000.csv.gz'
bucket_name = 'bigquery-hive-load'
gcs_bucket = storage_client.get_bucket(bucket_name)
blob = gcs_bucket.blob("shakespeare.csv.gz")
blob.download_to_filename(edgenode_destination_uri)
logging.info('Downloded {} to {}'.format(
gcs_bucket, edgenode_destination_uri))
# [end download to single files]
After some trial, I solved this and couldn't stop myself from posting here as well. 经过一番试验,我解决了这个问题,也无法阻止自己在这里发布。
bucket_name = 'mybucket'
folder='/projects/bigquery/download/shakespeare/'
delimiter='/'
file = 'shakespeare'
# Retrieve all blobs with a prefix matching the file.
bucket=storage_client.get_bucket(bucket_name)
# List blobs iterate in folder
blobs=bucket.list_blobs(prefix=file, delimiter=delimiter) # Excluding folder inside bucket
for blob in blobs:
print(blob.name)
destination_uri = '{}/{}'.format(folder, blob.name)
blob.download_to_filename(destination_uri)
It looks like you may simply have the wrong level of indentation in your python code. 看起来你可能只是在你的python代码中有错误的缩进级别。 The block beginning with
# Retrieve all blobs with a prefix matching the folder
is within the scope of the if
above so it's never executed if the folder already exists. 以
# Retrieve all blobs with a prefix matching the folder
开头的块# Retrieve all blobs with a prefix matching the folder
都在if
的范围内,因此如果文件夹已经存在则永远不会执行。
Try this: 试试这个:
# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"
# Create this folder locally
if not os.path.exists(folder):
os.makedirs(folder)
# Retrieve all blobs with a prefix matching the folder
bucket=storage_client.get_bucket(bucket_name)
print(bucket)
blobs=list(bucket.list_blobs(prefix=folder))
print(blobs)
for blob in blobs:
if(not blob.name.endswith("/")):
blob.download_to_filename(blob.name)
# [End download to multiple files]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.