I am using lxml to check Product elements as they stream in a MapReduce job. I am trying to make sure that only the correct xmlns value is present in every element. For example, every Product element should have an xmlns set to " http://mynetwork.products.com/new ":
<Product xmlns="http://mynetwork.products.com/new">
As I check each Product element (streamed one at a time), I just want to make sure that it looks like the above. I want to check for the following potential errors:
<Product xmlns="http://mynetwork.products.com/old">
<Product xmlns="">
<Product>
<Product xmlns="http://mynetwork.products.com/new" something="else">
I tried storing the value of Product.nsmap for each element (which is a dictionary) and then reading the values of the dictionary to validate, but it doesn't help me detect any of the below cases. There must be a way.
You can check combination of nsmap
and attrib
properties of each Product
element. nsmap
should contains only one key value pair ie key None
with value "http://mynetwork.products.com/new"
, and attrib
should be empty since you won't allow any attributes in the element.
Brief example (pyhon 2.7) :
>>> from lxml import etree
>>> raw = '''<root>
... <Product xmlns="http://mynetwork.products.com/new"/>
... <Product xmlns="http://mynetwork.products.com/new" something="else"/>
... <Product xmlns="http://mynetwork.products.com/old" />
... <Product xmlns=""/>
... <Product/>
... </root>'''
...
>>> root = etree.fromstring(raw)
>>> for p in root.findall('*'):
... isValid = len(p.nsmap) == 1 \
... and p.nsmap[None] == 'http://mynetwork.products.com/new' \
... and not p.attrib
... print isValid
...
True
False
False
False
False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.