python boto3 s3 客戶端過濾子目錄和深度

Question

我試圖在不返回任何文件名的情況下獲取 S3 存儲桶中的子目錄列表。

我的 S3 存儲桶具有以下結構。

s3://my-bucket/databases/mysql-<date>-<hour>    # host-2022-09-09-10
s3://my-bucket/databases/mysql-<date>-<hour>/tarfiles.tar.gz

我試圖只返回像mysql-<date>-<hour>這樣的目錄名。 我不再需要mysql-xx中的任何子目錄或文件名。

由於一切都存儲為對象，我找不到任何解決方案，如設置depth-level等。

我的代碼：

        s3 = boto3.resource('s3')
        my_bucket = s3.Bucket(S3_BUCKET)
        prefix = 'databases/mysql-'
        for item in my_bucket.objects.filter(Prefix=prefix):
            st.write(item.key)

另一種選擇是執行 pythonic grep/過濾文件名。 但這無濟於事，因為每個請求都會掃描所有文件並返回，並且必須過濾整個列表。 不必要地變得昂貴。

謝謝！

Answer 1

您想要列出給定前綴下的共享前綴。

這在底層 API 中得到支持，盡管 boto3 的“資源”object model 不支持顯示給定資源的前綴。 為此，您需要使用較低級別的“客戶端”界面：

prefix = 'databases/mysql-'
s3 = boto3.client('s3')
paginator = s3.get_paginator("list_objects_v2")
# Specify the prefix to scan, and the delimiter to break the prefix into
for page in paginator.paginate(Bucket=S3_BUCKET, Prefix=prefix, Delimiter='/'):
    for prefix in page.get("CommonPrefixes", []):
        print(prefix['Prefix'])

python boto3 s3 客戶端過濾子目錄和深度

問題描述

1 個解決方案

解決方案1
2 2022-10-07 18:02:07

python boto3 s3 客戶端過濾子目錄和深度

問題描述

1 個解決方案

解決方案1 2 2022-10-07 18:02:07

解決方案1
2 2022-10-07 18:02:07