简体   繁体   English

如何使用lxml检查每个元素中的xmlns

[英]How to check xmlns in every element using lxml

I am using lxml to check Product elements as they stream in a MapReduce job. 我正在使用lxml检查Product元素,因为它们在MapReduce作业中流式传输。 I am trying to make sure that only the correct xmlns value is present in every element. 我试图确保每个元素中仅存在正确的xmlns值。 For example, every Product element should have an xmlns set to " http://mynetwork.products.com/new ": 例如,每个Product元素都应将xmlns设置为“ http://mynetwork.products.com/new ”:

<Product xmlns="http://mynetwork.products.com/new">

As I check each Product element (streamed one at a time), I just want to make sure that it looks like the above. 当我检查每个Product元素(一次流式传输)时,我只想确保它看起来像上面的一样。 I want to check for the following potential errors: 我想检查以下潜在错误:

  1. Incorrect xmlns URL: 不正确的xmlns URL:

<Product xmlns="http://mynetwork.products.com/old">

  1. Missing URL 缺少网址

<Product xmlns="">

  1. Missing xmlns key/value pair 缺少xmlns键/值对

<Product>

  1. Extra attribution in the Product element 产品元素中的额外归因

<Product xmlns="http://mynetwork.products.com/new" something="else">

I tried storing the value of Product.nsmap for each element (which is a dictionary) and then reading the values of the dictionary to validate, but it doesn't help me detect any of the below cases. 我尝试为每个元素(这是一个字典)存储Product.nsmap的值,然后读取字典的值进行验证,但是这无助于我发现以下任何一种情况。 There must be a way. 一定有办法。

You can check combination of nsmap and attrib properties of each Product element. 您可以检查每个Product元素的nsmapattrib属性的组合。 nsmap should contains only one key value pair ie key None with value "http://mynetwork.products.com/new" , and attrib should be empty since you won't allow any attributes in the element. nsmap应该仅包含一个键值对,即键None ,其值应为"http://mynetwork.products.com/new" ,并且attrib应该为空,因为您不允许该元素中的任何属性。

Brief example (pyhon 2.7) : 简短示例(pyhon 2.7):

>>> from lxml import etree
>>> raw = '''<root>
... <Product xmlns="http://mynetwork.products.com/new"/>
... <Product xmlns="http://mynetwork.products.com/new" something="else"/>
... <Product xmlns="http://mynetwork.products.com/old" />
... <Product xmlns=""/>
... <Product/>
... </root>'''
... 
>>> root = etree.fromstring(raw)
>>> for p in root.findall('*'):
...     isValid = len(p.nsmap) == 1 \
...         and p.nsmap[None] == 'http://mynetwork.products.com/new' \
...         and not p.attrib
...     print isValid
... 
True
False
False
False
False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM