簡體   English   中英

Python boto3,列出桶中特定目錄的內容,限制深度

[英]Python boto3, list contents of specific dir in bucket, limit depth

這和這個問題一樣,但我也想限制返回的深度。

目前,所有答案都返回指定前綴后的所有對象。 我想查看當前層次結構級別中的內容。

返回所有內容的當前代碼:

self._session = boto3.Session(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
)
self._session.resource("s3")
bucket = self._s3.Bucket(bucket_name)
detections_contents = bucket.objects.filter(Prefix=prefix)
for object_summary in detections_contents:
    print(object_summary.key)

如何只看到prefix下的文件和文件夾? 如何深入go n層?

我可以在本地解析所有內容,這顯然不是我在這里尋找的。

如果不獲取目錄中的所有對象,則沒有明確的方法可以使用列表對象來執行此操作。 但是有一種使用 s3 select 的方法,它使用 sql 查詢格式來獲取 n 級深度以獲取文件內容以及獲取 object 鍵。 如果您對編寫 sql 沒問題,那么使用它。 參考文件

import boto3
import json

s3 = boto3.client('s3')

bucket_name = 'my-bucket'
prefix = 'my-directory/subdirectory/'

input_serialization = {
    'CompressionType': 'NONE',
    'JSON': {
        'Type': 'LINES'
    }
}
output_serialization = {
    'JSON': {}
}

# Set the SQL expression to select the key field for all objects in the subdirectory
expression = 'SELECT s.key FROM S3Object s WHERE s.key LIKE \'' + prefix + '%\''

response = s3.select_object_content(
    Bucket=bucket_name,
    ExpressionType='SQL',
    Expression=expression,
    InputSerialization=input_serialization,
    OutputSerialization=output_serialization
)

# The response will contain a Payload field with the selected data
payload = response['Payload']
for event in payload:
    if 'Records' in event:
        records = event['Records']['Payload']
        data = json.loads(records.decode('utf-8'))
        # The data will be a list of objects, each with a "key" field representing the file name
        for item in data:
            print(item['key'])

Boto3 或 S3 API 沒有內置的方式來執行此操作。 您將需要某種版本的處理每個級別並依次詢問該級別的對象列表:

import boto3

s3 = boto3.client('s3')
max_depth = 2

paginator = s3.get_paginator('list_objects_v2')
# Track all prefixes to show with a list
common_prefixes = [(0, "")]
while len(common_prefixes) > 0:
    # Pull out the next prefix to show
    current_depth, current_prefix = common_prefixes.pop(0)

    # Loop through all of the items using a paginator to handle common prefixes with more
    # than a thousand items
    for page in paginator.paginate(Bucket=bucket_name, Prefix=current_prefix, Delimiter='/'):
        for cur in page.get("CommonPrefixes", []):
            # Show each common prefix, here just use a format like AWS CLI does
            print(" " * 27 + f"PRE {cur['Prefix']}")
            if current_depth < max_depth:
                # This is below the max depth we want to show, so 
                # add it to the list to be shown
                common_prefixes.append((current_depth + 1, cur['Prefix']))
        for cur in page.get("Contents", []):
            # Show each item sharing this common prefix using a format like the AWS CLI
            print(f"{cur['LastModified'].strftime('%Y-%m-%d %H:%M:%S')}{cur['Size']:11d} {cur['Key']}")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM