[英]Can't access child elements in XML
I was trying to parse a string in XML format into some plain python objects, and I tried to use the find
and findall
methods to access some child elements but it doesn't work . 我试图将XML格式的字符串解析为一些普通的python对象,并且尝试使用
find
和findall
方法访问某些子元素,但它不起作用。
Here is the XML data I'am trying to parse : 这是我要解析的XML数据:
<?xml version="1.0" ?>
<ItemSearchResponse
xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
<Items>
<Request>
<IsValid>True</IsValid>
<ItemSearchRequest>
<Keywords>iphone</Keywords>
<ResponseGroup>ItemAttributes</ResponseGroup>
<SearchIndex>All</SearchIndex>
</ItemSearchRequest>
</Request>
<TotalResults>40721440</TotalResults>
<TotalPages>4072144</TotalPages>
<Item>
<ASIN>B00YV50QU4</ASIN>
<ParentASIN>B018GTHAKO</ParentASIN>
<DetailPageURL>http://www.amazon.com/Apple-iPhone-MD439LL-Smartphone-Refurbished/dp/B00YV50QU4%3Fpsc%3D1%26SubscriptionId%3DAKIAIEEA4BKMTHTI2T7A%26tag%3Dshopit021-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00YV50QU4</DetailPageURL>
<ItemLinks>
</ItemLinks>
<ItemAttributes>
</ItemAttributes>
</Item>
<Item>
<ASIN>B00VHSXBUA</ASIN>
<ParentASIN>B0152TROY8</ParentASIN>
<ItemAttributes>
</ItemAttributes>
</Item>
</Items>
</ItemSearchResponse>
I've deleted some data, in order to make this sample shorter . 为了使此样本更短,我删除了一些数据。
And here is my code . 这是我的代码。
data = et.fromstring(response)
items = data[0][3]
print items.tag
items = data[0].findall('item')
print len(items.findall('.//item'))
The first way to access child nodes ('item') is to use the list index notation, and it's working fine. 访问子节点('item')的第一种方法是使用列表索引符号,并且运行良好。 But using the find all method it's not working and the
len()
always return 0 . 但是使用find all方法无法正常工作,并且
len()
始终返回0。
I've tried to use XPath and other ways, but using the index is the only way to get it work . 我尝试使用XPath和其他方式,但是使用索引是使其工作的唯一方法。
Why methods like find
and findall
don't work ? 为什么像
find
和findall
这样的方法不起作用?
Why methods like find and findall don't work ?
为什么像find和findall这样的方法不起作用?
Because there are no elements named Item
. 因为没有名为
Item
元素。 Your document defines a default XML namespace of http://webservices.amazon.com/AWSECommerceService/2011-08-01
, which means that an element that looks like <Item>
in your document is actually contained in that namespace and is different from an element that looks like <Item>
in a document without a default XML namespace (or with a different XML namespace). 您的文档定义的默认XML命名空间
http://webservices.amazon.com/AWSECommerceService/2011-08-01
,这意味着,看起来像一个元素<Item>
您的文档中实际上是包含在该命名空间,距离不同在没有默认XML名称空间(或具有其他XML名称空间)的文档中看起来像<Item>
的元素。
You want something like: 您想要类似的东西:
>>> ns = 'http://webservices.amazon.com/AWSECommerceService/2011-08-01'
>>> items = data[0].findall('{%s}Item' % ns)
>>> items
[<Element {http://webservices.amazon.com/AWSECommerceService/2011-08-01}Item at 0x7f1cbaaba8c0>, <Element {http://webservices.amazon.com/AWSECommerceService/2011-08-01}Item at 0x7f1cbaaba680>]
Or, using XPath: 或者,使用XPath:
>>> items = data[0].xpath('n:Item', namespaces={'n': ns})
>>> items
[<Element {http://webservices.amazon.com/AWSECommerceService/2011-08-01}Item at 0x7f1cbaaba8c0>, <Element {http://webservices.amazon.com/AWSECommerceService/2011-08-01}Item at 0x7f1cbaaba680>]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.