简体   繁体   English


[英]How to filter out glacier files with boto3?

I'm writing a script to parse S3 buckets files, without needing to download them locally. 我正在编写一个脚本来解析S3存储桶文件,而无需在本地下载它们。 It seems the code works as far as it doesn't find glacier files. 似乎该代码在找不到冰川文件的范围内仍然有效。 I'm adding an exception for now (error handling looks better in actual code, I promise), but ideally I'd like to see if it's possible to filter glacier files out. 我现在要添加一个例外(我保证,错误处理在实际代码中看起来更好),但是理想情况下,我想看看是否有可能过滤掉冰川文件。

Here is my code: 这是我的代码:

import boto3
import gzip
import os

    s3_client = boto3.client('s3')
    bucket = 'my_bucket'
    prefix = 'path_to_file/file_name.csv.gz'
    obj = s3_client.get_object(Bucket=bucket, Key=prefix)
    body = obj['Body']
    with gzip.open(body, 'rt') as gf:
        for ln in gf:
except Exception as e:

I see that using AWS CLI, I can at lest sort files in the way glacier files are at the bottom, so there must be a way to either way sort or filter them out in boto3: 我看到使用AWS CLI,我至少可以按照冰川文件位于底部的方式对文件进行排序,因此必须有一种方法可以在boto3中对它们进行排序或过滤:

aws s3api list-objects --bucket my-bucket --query "reverse(sort_by(Contents,&LastModified))"

Solved using StorageClass == 'STANDARD' (vs == 'GLACIER'): 使用StorageClass =='STANDARD'(vs =='GLACIER')解决:

bucket = 'my_bucket'
prefix = 'path/to/files/'
s3_client = boto3.client('s3')
response = s3_client.list_objects(Bucket=bucket, Prefix=prefix)
for file in response['Contents']:
    if file['StorageClass'] == 'STANDARD':
        name = file['Key'].rsplit('/', 1)
        if name[1] != '':
            file_name = name[1]
            obj = s3_client.get_object(Bucket=bucket, Key=prefix + file_name)
            body = obj['Body']
            lns = []
            i = 0
            with gzip.open(body, 'rt') as gf:
                for ln in gf:
                    i += 1
                    if i == 10:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM