I am looping through multiple files within an s3 bucket. The first iteration works perfectly fine, but once jumping to the next I receive an "ERROR [Errno 2] No such file or directory: 'part-r-00001.gz'". (part-r-00000.gz was accessed correctly)
I am not sure why the file is not found as it is available in the bucket.
This is the code:
BUCKET = 'bucket'
PREFIX = 'path'
now = datetime.utcnow()
today = (now - timedelta(days=2)).strftime('%Y-%m-%d')
folder_of_the_day = PREFIX + today + '/'
logger.info("map folder: %s", folder_of_the_day)
client = boto3.client('s3')
response = client.list_objects_v2(Bucket=BUCKET, Prefix=folder_of_the_day)
for content in response.get('Contents', []):
bucket_file = os.path.split(content["Key"])[-1]
if bucket_file.endswith('.gz'):
logger.info("----- starting with file: %s -----", bucket_file)
try:
with gzip.open(bucket_file, mode="rt") as file:
for line in file:
//do something
except Exception as e:
logger.error(e)
logger.critical("Failed to open file!")
sys.exit(4)
Once executed for the second round, this is the output:
2022-06-18 12:14:48,027 [root] INFO ----- starting with file: part-r-00001.gz ----- 2022-06-18 12:14:48,028 [root] ERROR [Errno 2] No such file or directory: 'part-r-00001.gz'
Update Based on the comment I updated my code to a proper gzip method, but still the error remains. Once the first iteration is done, the second file is not being found.
This is the updated code:
try:
with gzip.GzipFile(bucket_file) as gzipfile:
decompressed_content = gzipfile.read()
for line in decompressed_content.splitlines():
//do something
break
I think you can not use gzip.open
on the S3 path directly.
You may need a proper gzip method to read files in S3 bucket.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.