简体   繁体   English

如何在xml响应中搜索某些字符串

[英]How to search for certain strings within an xml response

I am using urllib2 library to get access to an s3 bucket i have. 我正在使用urllib2库来访问我拥有的s3存储桶。 I get an xml structure back. 我得到一个xml结构。 The problem is i want to find nodes in that structure that their Key starts with "part-" 问题是我想在其键以“ part-”开头的结构中找到节点

i want to then extract and save those in a list/array whatever and loop through them afterwards to read the contents of those files 我想然后提取并保存在列表/数组中,然后遍历它们以读取那些文件的内容

part of xml response xml响应的一部分

<Contents>
<Key>output/part-00000</Key>
<LastModified>2016-05-11T17:01:19.000Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>
<Contents>
<Key>output/part-00001</Key>
<LastModified>2016-05-11T17:01:15.000Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
<StorageClass>STANDARD</StorageClass>
</Contents>

Right now i am doing the following 现在我正在做以下

import xml.etree.ElementTree as ET

f = urllib2.urlopen("https://s3.amazonaws.com/*******")

tree = ET.parse(f)
root = tree.getroot()

for child in root:
    print child

output 输出

<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Name' at 0x103a325d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Prefix' at 0x103a32610>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Marker' at 0x103a32690>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}MaxKeys' at 0x103a32710>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}IsTruncated' at 0x103a32750>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32b10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32cd0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a32e90>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e090>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e250>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e410>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e5d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3e950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3eb10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ecd0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a3ee90>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47090>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47250>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47410>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a475d0>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47790>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47950>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47b10>
<Element '{http://s3.amazonaws.com/doc/2006-03-01/}Contents' at 0x103a47cd0>

i have tried various solutions using minidom, and xml.etree.ElementTree but i do not quite get it right. 我已经尝试过使用minidom和xml.etree.ElementTree的各种解决方案,但我不太正确。

So what i want is to loop through those xml nodes find all references of part-***** and save them in an array. 所以我想要遍历这些xml节点,找到part-*****的所有引用并将它们保存在数组中。

any help/clues is/are welcomed 任何帮助/提示都受到欢迎

my solution 我的解决方案

f = urllib2.urlopen("https://s3.amazonaws.com/******")

tree = ET.parse(f)
root = tree.getroot()

for child in root.findall('{http://s3.amazonaws.com/doc/2006-03-01/}Contents'):
    for key in child.findall("{http://s3.amazonaws.com/doc/2006-03-01/}Key"):
        print key.text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM