简体   繁体   English

循环遍历压缩的 gzip 文件在第二次迭代时抛出“错误 [Errno 2] 没有这样的文件或目录:'part-r-00001.gz'” - Python

[英]Loop through compressed gzip files throws "ERROR [Errno 2] No such file or directory: 'part-r-00001.gz'" at second iteration - Python

I am looping through multiple files within an s3 bucket.我正在循环访问 s3 存储桶中的多个文件。 The first iteration works perfectly fine, but once jumping to the next I receive an "ERROR [Errno 2] No such file or directory: 'part-r-00001.gz'".第一次迭代工作得很好,但是一旦跳转到下一次迭代,我就会收到“错误 [Errno 2] 没有这样的文件或目录:'part-r-00001.gz'”。 (part-r-00000.gz was accessed correctly) (正确访问了 part-r-00000.gz)

I am not sure why the file is not found as it is available in the bucket.我不确定为什么找不到该文件,因为它在存储桶中可用。

This is the code:这是代码:

BUCKET = 'bucket'
PREFIX = 'path'

now = datetime.utcnow()
today = (now - timedelta(days=2)).strftime('%Y-%m-%d')
folder_of_the_day = PREFIX + today + '/'
logger.info("map folder: %s", folder_of_the_day)

client = boto3.client('s3')
response = client.list_objects_v2(Bucket=BUCKET, Prefix=folder_of_the_day)
for content in response.get('Contents', []):
    bucket_file = os.path.split(content["Key"])[-1]
    if bucket_file.endswith('.gz'):
        logger.info("----- starting with file: %s -----", bucket_file)
        try:
            with gzip.open(bucket_file, mode="rt") as file:
                for line in file:
                    //do something

        except Exception as e:
            logger.error(e)
            logger.critical("Failed to open file!")
            sys.exit(4)

Once executed for the second round, this is the output:在第二轮执行后,这是 output:

2022-06-18 12:14:48,027 [root] INFO ----- starting with file: part-r-00001.gz ----- 2022-06-18 12:14:48,028 [root] ERROR [Errno 2] No such file or directory: 'part-r-00001.gz' 2022-06-18 12:14:48,027 [root] INFO ----- 从文件开始:part-r-00001.gz ----- 2022-06-18 12:14:48,028 [root] ERROR [ Errno 2] 没有这样的文件或目录:'part-r-00001.gz'

Update Based on the comment I updated my code to a proper gzip method, but still the error remains.更新根据评论,我将代码更新为正确的 gzip 方法,但错误仍然存在。 Once the first iteration is done, the second file is not being found.第一次迭代完成后,找不到第二个文件。

This is the updated code:这是更新后的代码:

try:
    with gzip.GzipFile(bucket_file) as gzipfile:
        decompressed_content = gzipfile.read()
        for line in decompressed_content.splitlines():
            //do something
            break

I think you can not use gzip.open on the S3 path directly.我认为您不能直接在 S3 路径上使用gzip.open

You may need a proper gzip method to read files in S3 bucket.您可能需要适当的 gzip 方法来读取 S3 存储桶中的文件。

Reading contents of a gzip file from a AWS S3 in Python 从 Python 中的 AWS S3 读取 gzip 文件的内容

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 错误:无法打开需求文件:[Errno 2] 没有这样的文件或目录:'requirements.txt' 使用 AWS Lambda 和 Python 时 - ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt' When using AWS Lambda and Python 使用gzip压缩卸载时如何卸载csv文件类型? - How to unload csv file type when unload is compressed with gzip? Python AWS Lambda S3: [Errno 2] 没有这样的文件或目录: 'which' - Python AWS Lambda S3: [Errno 2] No such file or directory: 'which' FileNotFoundError: [Errno 2] 没有这样的文件或目录 boto3 aws - FileNotFoundError: [Errno 2] No such file or directory boto3 aws 在上传到云存储之前 Gzip Python 中的文件 - Gzip a file in Python before uploading to Cloud Storage GCS 数据流抛出错误 - FileNotFoundException(找不到此类文件或目录) - GCS Dataflow throws an error - FileNotFoundException(No such file or directory found) AWS 抛出以下错误:“错误的解释器:没有这样的文件或目录” - AWS throws the following error: "bad interpreter: No such file or directory" hadoop fs -ls s3://bucket 或 s3a://bucket 抛出“没有这样的文件或目录”错误 - hadoop fs -ls s3://bucket or s3a://bucket throws "No such file or directory" error 运行 Elastic Beanstalk CLI 时出现“FileNotFoundError: [Errno 2] No such file or directory” - Got `FileNotFoundError: [Errno 2] No such file or directory` when running Elastic Beanstalk CLI Python:Stream 来自 s3 的 gzip 文件 - Python: Stream gzip files from s3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM