简体   繁体   English

使用 boto3 和 python 从 S3 存储桶目录中仅读取特定格式的文件

[英]Reading only specific format files from a S3 bucket dir using boto3 and python

In my s3 bucket directory, I have multiple files like.csv, .log, .txt, etc. But I need to read-only .log files from a single directory and append them using boto3.在我的 s3 存储桶目录中,我有多个文件,例如 .csv、.log、.txt 等。但我需要使用 boto3 读取单个目录中的只读.log文件和 append 它们。 I tried below code but it's reading all files data, not able to restrict using *.log and also the result is coming as a single line separated by '\n' as mentioned below.我尝试了下面的代码,但它正在读取所有文件数据,无法使用 *.log 进行限制,并且结果以单行形式出现,由 '\n' 分隔,如下所述。
How can I read only log files and merge them and the result should come like line by line.我怎样才能只读取日志文件并合并它们,结果应该是一行一行的。

    import boto3
    import pandas as pd
    import csv
    
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('my_bucket')
    
    lst = []
    for object in my_bucket.objects.filter(Prefix="bulk_data/all_files/"):
        print(object.key)
        bdy = object.get()['Body'].read().decode('utf-8')
        lst.append(bdy)
        bdy = ''
    print(lst)

lst output coming like this with '\n' as separator. lst output 像这样以 '\n' 作为分隔符。 '12345,6006,7290,7200,JKHBJ,S,55\n44345,6996,6290,7288,JKHkk,R,57\n..........' '12345,6006,7290,7200,JKHBJ,S,55\n44345,6996,6290,7288,JKHkk,R,57\n.......'

I should get something like below:我应该得到如下内容:

12345,6006,7290,7200,JKHBJ,S,55 12345,6006,7290,7200,JKHBJ,S,55

44345,6996,6290,7288,JKHkk,R,57 44345,6996,6290,7288,JKHkk,R,57

... ...

The filter takes only prefix, not suffix . filter只接受前缀,而不接受后缀 Thus you have to filter it yourself, for example using:因此,您必须自己过滤它,例如使用:

import boto3
import pandas as pd
import csv

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')

lst = []
for s3obj in my_bucket.objects.filter(Prefix="attachments/"):
    
    # skip s3 objects not ending with csv
    if (not s3obj.key.endswith('csv')): continue
        
    print(s3obj.key)
    bdy = s3obj.get()['Body'].read().decode('utf-8')
    lst.append(bdy)
    bdy = ''
    
#print(lst)

for file_str in lst:
    for line in file_str.split('\n'):
        print(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python boto3将文件从一个S3存储桶传输到另一个S3存储桶 - Transfer files from one S3 bucket to another S3 bucket using python boto3 如何使用 python boto3 将文件和文件夹从一个 S3 存储桶复制到另一个 S3 - how to copy files and folders from one S3 bucket to another S3 using python boto3 如何使用 Python Boto3 列出和读取 S3 存储桶的特定文件夹中的每个文件 - How to list and read each of the files in specific folder of an S3 bucket using Python Boto3 如何使用boto3计算s3存储桶中具有特定命名结构的csv文件? - How to count csv files with specific naming structure in s3 bucket using boto3? 无法使用AWS Python Boto3创建S3存储桶(在特定区域) - Unable to Create S3 Bucket(in specific Region) using AWS Python Boto3 使用boto3从AWS S3存储桶下载-时间戳格式不正确 - Download from AWS S3 bucket using boto3 - incorrect timestamp format 如何使用 python boto3 将嵌套目录和文件上传到 s3 存储桶 - How to upload nested directories and files into s3 bucket using python boto3 如何使用 Python 和 Boto3 从 S3 Bucket 读取 Txt 文件 - How to read Txt file from S3 Bucket using Python And Boto3 使用 boto3 从 S3 存储桶中查找最新的 CSV 文件,Python - Find latest CSV File from S3 bucket using boto3, Python 使用 boto3 python 在 s3 中上传文件文件夹和子文件夹 - Upload files folder and subfolder in s3 using boto3 python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM