简体   繁体   English

如何将 Google Cloud Storage 中的文件从一个存储桶移动到另一个存储桶 Python

[英]How to move files in Google Cloud Storage from one bucket to another bucket by Python

Are there any API function that allow us to move files in Google Cloud Storage from one bucket in another bucket?是否有任何 API function 允许我们将 Google Cloud Storage 中的文件从一个存储桶移动到另一个存储桶?

The scenario is we want Python to move read files in A bucket to B bucket.场景是我们希望 Python 将 A 桶中的读取文件移动到 B 桶。 I knew that gsutil could do that but not sure Python can support that or not.我知道 gsutil 可以做到这一点,但不确定 Python 是否支持。

Thanks.谢谢。

Here's a function I use when moving blobs between directories within the same bucket or to a different bucket.这是我在同一存储桶内的目录之间移动 blob 或移动到不同存储桶时使用的函数。

from google.cloud import storage
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path_to_your_creds.json"

def mv_blob(bucket_name, blob_name, new_bucket_name, new_blob_name):
"""
Function for moving files between directories or buckets. it will use GCP's copy 
function then delete the blob from the old location.

inputs
-----
bucket_name: name of bucket
blob_name: str, name of file 
    ex. 'data/some_location/file_name'
new_bucket_name: name of bucket (can be same as original if we're just moving around directories)
new_blob_name: str, name of file in new directory in target bucket 
    ex. 'data/destination/file_name'
"""
storage_client = storage.Client()
source_bucket = storage_client.get_bucket(bucket_name)
source_blob = source_bucket.blob(blob_name)
destination_bucket = storage_client.get_bucket(new_bucket_name)

# copy to new destination
new_blob = source_bucket.copy_blob(
    source_blob, destination_bucket, new_blob_name)
# delete in old destination
source_blob.delete()

print(f'File moved from {source_blob} to {new_blob_name}')

Using the google-api-python-client , there is an example on the storage.objects.copy page.使用google-api-python-clientstorage.objects.copy页面上有一个示例。 After you copy, you can delete the source with storage.objects.delete .复制后,您可以使用storage.objects.delete删除源。

destination_object_resource = {}
req = client.objects().copy(
        sourceBucket=bucket1,
        sourceObject=old_object,
        destinationBucket=bucket2,
        destinationObject=new_object,
        body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)

client.objects().delete(
        bucket=bucket1,
        object=old_object).execute()

you can use GCS Client Library Functions documented at [1] to read to one bucket and write to the other and then delete source file.您可以使用 [1] 中记录的 GCS 客户端库函数读取一个存储桶并写入另一个存储桶,然后删除源文件。

You can even use the GCS REST API documented at [2].您甚至可以使用 [2] 中记录的 GCS REST API。

Link:关联:
[1] - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions [1] - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions
[2] - https://developers.google.com/storage/docs/concepts-techniques#overview [2] - https://developers.google.com/storage/docs/concepts-techniques#overview

def GCP_BUCKET_A_TO_B():                                                                           
    source_bucket = storage_client.get_bucket("Bucket_A_Name")
    filename = [filename.name for filename in 
    list(source_bucket.list_blobs(prefix=""))]
    for i in range (0,len(filename)):
        source_blob = source_bucket.blob(filename[i])
        destination_bucket = storage_client.get_bucket("Bucket_B_Name")
        new_blob = source_bucket.copy_blob(
            source_blob, destination_bucket, filename[i])  

I just wanted to point out that there's another possible approach and that is using gsutil through the use of the subprocess module.我只是想指出还有另一种可能的方法,那就是通过使用subprocess模块来使用gsutil

The advantages of using gsutil like that:像这样使用gsutil的优点:

  • You don't have to deal with individual blobs您不必处理单个 blob
  • gsutil's implementation of the move and especially rsync will probably be much better and more resilient that what we do ourselves. gsutil 的移动实现,尤其是 rsync 可能会比我们自己做的更好,更有弹性。

The disadvantages:缺点:

  • You can't deal with individual blobs easily您无法轻松处理单个 blob
  • It's hacky and generally a library is preferable to executing shell commands它很老套,通常库比执行 shell 命令更可取

Example:例子:

def move(source_uri: str,
         destination_uri: str) -> None:
    """
    Move file from source_uri to destination_uri.

    :param source_uri: gs:// - like uri of the source file/directory
    :param destination_uri: gs:// - like uri of the destination file/directory
    :return: None
    """
    cmd = f"gsutil -m mv {source_uri} {destination_uri}"
    subprocess.run(cmd)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从谷歌云存储桶中读取文件 - Reading files from google cloud storage bucket 从 GOOGLE CLOUD STORAGE BUCKET 下载多个文件 - Download multiple files from GOOGLE CLOUD STORAGE BUCKET Python3 中的 Cloud Function - 从 Google Cloud Bucket 复制到另一个 Google Cloud Bucket - Cloud Function in Python3 - copy from Google Cloud Bucket to another Google Cloud Bucket 从 Google Cloud Storage Bucket 下载文件夹 - Downloading folders from Google Cloud Storage Bucket 如何使用 pip 下载 python package 到具有公共访问权限的谷歌云存储桶中并从那里安装 - How to download a python package using pip into google Cloud Storage bucket with public access and to install from there 如何将 BigQuery 视图作为 csv 文件传输到 Google Cloud Storage 存储桶 - How to Transfer a BigQuery view to a Google Cloud Storage bucket as a csv file 如何防止删除 Google Cloud Storage Bucket 中的某个文件夹? - How to prevent the remove of a certain folder in a Google Cloud Storage Bucket? 如何将 Google Cloud Storage 存储桶与本地文件夹同步以供离线使用? - How to sync Google Cloud Storage bucket with local folder for offline usage? Google Cloud Storage 对存储桶中的对象进行分页 (PHP) - Google Cloud Storage paginate objects in a bucket (PHP) 如何从带有子文件夹的 Google Cloud Storage 存储桶中检索文件夹名称 - How to retrieve the folder name from Google Cloud Storage bucket with sub-folder
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM