[英]Trying to parse XML from string into Python
So first the string 所以首先是字符串
'<?xml version="1.0" encoding="UTF-8"?><metalink version="3.0" xmlns="http://www.metalinker.org/" xmlns:lcgdm="LCGDM:" generator="lcgdm-dav" pubdate="Fri, 11 Oct 2013 12:46:10 GMT"><files><file name="/lhcb/L"><size>173272912</size><resources><url type="https">https://test-kit.test.de:2880/pnfs/test.file</url><url type="https">https://test.grid.sara.nl:2882/pnfs/test.file</url></resources></file></files></metalink>'
What I want to extract is the url
text. 我要提取的是
url
文字。 Following code works but has flaws because it's hard coded: 以下代码有效但有缺陷,因为它是硬编码的:
root = ET.fromstring( xml_string )
for entry in root[0][0][1].iter():
print entry.text
So this only works if the xml structure is the same. 所以这只适用于xml结构相同的情况。 I tried to use xpath but I never got it working or with tags.
我尝试使用xpath,但我从来没有使用它或使用标签。 I never got any results.
我从来没有得到任何结果。
Is it a problem with the format of the xml string or am I doing something wrong? 这是xml字符串格式的问题还是我做错了什么?
You can use xpath (and findall
function of Node
) to get the urls , but since you have used xmlns="http://www.metalinker.org/"
for the root element, you will need to use that xmlns
in the xpath
as well. 您可以使用xpath(和
Node
findall
函数)来获取URL,但由于您已经使用xmlns="http://www.metalinker.org/"
作为根元素,因此您需要在xpath
使用该xmlns
同样。
Example - 示例 -
>>> root = fromstring(xml_string)
>>> urls = root.findall('.//{http://www.metalinker.org/}url')
>>> for url in urls:
... print(url.text)
...
https://test-kit.test.de:2880/pnfs/test.file
https://test.grid.sara.nl:2882/pnfs/test.file
The above xpath will find all urls in the xml. 上面的xpath将找到xml中的所有url。
You used namespaces, so you need to use them in XPath: 您使用了名称空间,因此您需要在XPath中使用它们:
for entry in root.findall('.//{http://www.metalinker.org/}url'):
print entry.text
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.