简体   繁体   中英

Python boto3, list contents of specific dir in bucket, limit depth

This is the same as this question , but I also want to limit the depth returned.

Currently, all answers return all the objects after the specified prefix. I want to see just what's in the current hierarchy level.

Current code that returns everything:

self._session = boto3.Session(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
)
self._session.resource("s3")
bucket = self._s3.Bucket(bucket_name)
detections_contents = bucket.objects.filter(Prefix=prefix)
for object_summary in detections_contents:
    print(object_summary.key)

How to see only the files and folders directly under prefix ? How to go n levels deep?

I can parse everything locally, and this is clearly not what I am looking for here.

There is no definite way to do this using list objects without getting all the objects in the dir. But there is a way using s3 select which uses sql query like format to get n levels deep to get the file content as well as to get object keys. If you are fine with writing sql then use this. reference doc

import boto3
import json

s3 = boto3.client('s3')

bucket_name = 'my-bucket'
prefix = 'my-directory/subdirectory/'

input_serialization = {
    'CompressionType': 'NONE',
    'JSON': {
        'Type': 'LINES'
    }
}
output_serialization = {
    'JSON': {}
}

# Set the SQL expression to select the key field for all objects in the subdirectory
expression = 'SELECT s.key FROM S3Object s WHERE s.key LIKE \'' + prefix + '%\''

response = s3.select_object_content(
    Bucket=bucket_name,
    ExpressionType='SQL',
    Expression=expression,
    InputSerialization=input_serialization,
    OutputSerialization=output_serialization
)

# The response will contain a Payload field with the selected data
payload = response['Payload']
for event in payload:
    if 'Records' in event:
        records = event['Records']['Payload']
        data = json.loads(records.decode('utf-8'))
        # The data will be a list of objects, each with a "key" field representing the file name
        for item in data:
            print(item['key'])

There is not built in way with the Boto3 or S3 APIs to do this. You'll need some version of processing each level and asking in turn for a list of objects at that level:

import boto3

s3 = boto3.client('s3')
max_depth = 2

paginator = s3.get_paginator('list_objects_v2')
# Track all prefixes to show with a list
common_prefixes = [(0, "")]
while len(common_prefixes) > 0:
    # Pull out the next prefix to show
    current_depth, current_prefix = common_prefixes.pop(0)

    # Loop through all of the items using a paginator to handle common prefixes with more
    # than a thousand items
    for page in paginator.paginate(Bucket=bucket_name, Prefix=current_prefix, Delimiter='/'):
        for cur in page.get("CommonPrefixes", []):
            # Show each common prefix, here just use a format like AWS CLI does
            print(" " * 27 + f"PRE {cur['Prefix']}")
            if current_depth < max_depth:
                # This is below the max depth we want to show, so 
                # add it to the list to be shown
                common_prefixes.append((current_depth + 1, cur['Prefix']))
        for cur in page.get("Contents", []):
            # Show each item sharing this common prefix using a format like the AWS CLI
            print(f"{cur['LastModified'].strftime('%Y-%m-%d %H:%M:%S')}{cur['Size']:11d} {cur['Key']}")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM