简体   繁体   English

从 Python 中的 S3 存储桶中读取 xml 个文件 - 仅存储最后一个文件的内容

[英]Reading xml files from S3 bucket in Python - Only the content of the last file is getting stored

I have 4 XML files inside the S3 bucket directory.我在 S3 存储桶目录中有 4 个 XML 文件。 When I'm trying to read the content of all the files, I find that only the content of the last file (XML4) is getting stored.当我尝试读取所有文件的内容时,我发现只存储了最后一个文件 (XML4) 的内容。

s3_bucket_name='test'
bucket=s3.Bucket(s3_bucket_name)
bucket_list = []
for file in bucket.objects.filter(Prefix = 'auto'):
    file_name=file.key
    if file_name.find(".xml")!=-1:
        bucket_list.append(file.key)

In the 'bucket_list', I can see that there are 4 files在“bucket_list”中,我可以看到有 4 个文件

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    
    
tree = ET.ElementTree(ET.fromstring(data))

What changes should be made in the code to read the content of all the XML files?要读取所有 XML 文件的内容,代码应该做哪些更改?

As mentioned, since you have a list of files, you need a corresponding list of trees.如前所述,由于您有一个文件列表,因此您需要一个相应的树列表。

tree_list = []

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    tree_list.append(ET.ElementTree(ET.fromstring(data)))

Then you can start using tree_list for whatever purpose.然后你可以开始使用tree_list用于任何目的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM