简体   繁体   中英

How to filter out glacier files with boto3?

I'm writing a script to parse S3 buckets files, without needing to download them locally. It seems the code works as far as it doesn't find glacier files. I'm adding an exception for now (error handling looks better in actual code, I promise), but ideally I'd like to see if it's possible to filter glacier files out.

Here is my code:

import boto3
import gzip
import os


try:
    s3_client = boto3.client('s3')
    bucket = 'my_bucket'
    prefix = 'path_to_file/file_name.csv.gz'
    obj = s3_client.get_object(Bucket=bucket, Key=prefix)
    body = obj['Body']
    with gzip.open(body, 'rt') as gf:
        for ln in gf:
            print(ln)
except Exception as e:
    print(e)

I see that using AWS CLI, I can at lest sort files in the way glacier files are at the bottom, so there must be a way to either way sort or filter them out in boto3:

aws s3api list-objects --bucket my-bucket --query "reverse(sort_by(Contents,&LastModified))"

Solved using StorageClass == 'STANDARD' (vs == 'GLACIER'):

bucket = 'my_bucket'
prefix = 'path/to/files/'
s3_client = boto3.client('s3')
response = s3_client.list_objects(Bucket=bucket, Prefix=prefix)
for file in response['Contents']:
    if file['StorageClass'] == 'STANDARD':
        name = file['Key'].rsplit('/', 1)
        if name[1] != '':
            file_name = name[1]
            obj = s3_client.get_object(Bucket=bucket, Key=prefix + file_name)
            body = obj['Body']
            lns = []
            i = 0
            with gzip.open(body, 'rt') as gf:
                for ln in gf:
                    i += 1
                    lns.append(ln.rstrip())
                    if i == 10:
                        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM