简体   繁体   中英

Reading only specific format files from a S3 bucket dir using boto3 and python

In my s3 bucket directory, I have multiple files like.csv, .log, .txt, etc. But I need to read-only .log files from a single directory and append them using boto3. I tried below code but it's reading all files data, not able to restrict using *.log and also the result is coming as a single line separated by '\n' as mentioned below.
How can I read only log files and merge them and the result should come like line by line.

    import boto3
    import pandas as pd
    import csv
    
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('my_bucket')
    
    lst = []
    for object in my_bucket.objects.filter(Prefix="bulk_data/all_files/"):
        print(object.key)
        bdy = object.get()['Body'].read().decode('utf-8')
        lst.append(bdy)
        bdy = ''
    print(lst)

lst output coming like this with '\n' as separator. '12345,6006,7290,7200,JKHBJ,S,55\n44345,6996,6290,7288,JKHkk,R,57\n..........'

I should get something like below:

12345,6006,7290,7200,JKHBJ,S,55

44345,6996,6290,7288,JKHkk,R,57

...

The filter takes only prefix, not suffix . Thus you have to filter it yourself, for example using:

import boto3
import pandas as pd
import csv

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')

lst = []
for s3obj in my_bucket.objects.filter(Prefix="attachments/"):
    
    # skip s3 objects not ending with csv
    if (not s3obj.key.endswith('csv')): continue
        
    print(s3obj.key)
    bdy = s3obj.get()['Body'].read().decode('utf-8')
    lst.append(bdy)
    bdy = ''
    
#print(lst)

for file_str in lst:
    for line in file_str.split('\n'):
        print(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM