如何在xml响应中搜索某些字符串

Question

I am using urllib2 library to get access to an s3 bucket i have. 我正在使用urllib2库来访问我拥有的s3存储桶。 I get an xml structure back. 我得到一个xml结构。 The problem is i want to find nodes in that structure that their Key starts with "part-" 问题是我想在其键以“ part-”开头的结构中找到节点

i want to then extract and save those in a list/array whatever and loop through them afterwards to read the contents of those files 我想然后提取并保存在列表/数组中，然后遍历它们以读取那些文件的内容

part of xml response xml响应的一部分

<Contents>
<Key>output/part-00000</Key>
<LastModified>2016-05-11T17:01:19.000Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
<Contents>
<Key>output/part-00001</Key>
<LastModified>2016-05-11T17:01:15.000Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>

Right now i am doing the following 现在我正在做以下

import xml.etree.ElementTree as ET

f = urllib2.urlopen("https://s3.amazonaws.com/*******")

tree = ET.parse(f)
root = tree.getroot()

for child in root:
    print child

output 输出

<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Name' at 0x103a325d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Prefix' at 0x103a32610>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Marker' at 0x103a32690>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}MaxKeys' at 0x103a32710>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}IsTruncated' at 0x103a32750>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32b10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32cd0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32e90>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e090>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e250>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e410>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e5d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3eb10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ecd0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ee90>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47090>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47250>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47410>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a475d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47b10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47cd0>

i have tried various solutions using minidom, and xml.etree.ElementTree but i do not quite get it right. 我已经尝试过使用minidom和xml.etree.ElementTree的各种解决方案，但我不太正确。

So what i want is to loop through those xml nodes find all references of part-***** and save them in an array. 所以我想要遍历这些xml节点，找到part-*****的所有引用并将它们保存在数组中。

any help/clues is/are welcomed 任何帮助/提示都受到欢迎

Answer 1

my solution 我的解决方案

f = urllib2.urlopen("https://s3.amazonaws.com/******")

tree = ET.parse(f)
root = tree.getroot()

for child in root.findall('{http://s3.amazonaws.com/doc/2006-03-01/}Contents'):
    for key in child.findall("{http://s3.amazonaws.com/doc/2006-03-01/}Key"):
        print key.text

如何在xml响应中搜索某些字符串

问题描述

1 个解决方案

解决方案1
0 2016-05-11 22:58:40

如何在xml响应中搜索某些字符串

问题描述

1 个解决方案

解决方案1 0 2016-05-11 22:58:40

解决方案1
0 2016-05-11 22:58:40