I'm using python + lxml to parse a spss file.
There seems to be many threads on this topic but the answers don't particular help me.
The answers I have come across:
- lower-case the entire input before parsing;
- if you know the complete list of tags in advance
For me these suggestion would take too much time.
Instead I would like to match strings only when necessary.
Here is the line of code I would like to edit:
xpath("//definition//variable[@name='"+tag_name+"']")
How can I get a hit if tag_name is:
tag_name = "Q1top"
tag_name = "q1Top"
tag_name = "q1TOP"
etc
I'm guessing some form of regex would be in order???
Alternatively, you can incorporate regex from http://exslt.org/regular-expressions
namespace in the XPath, for example :
ns = {"re": "http://exslt.org/regular-expressions"}
query = "//definition//variable[re:test(@name, '^{0}$', 'i')]".format(tag_name)
result = tree.xpath(query, namespaces=ns)
Using local-name
, translate
XPath functions:
>>> import lxml.etree
>>>
>>> root = lxml.etree.fromstring('''
... <root>
... <parent>
... <Q1top>1</Q1top>
... <q1Top>2</q1Top>
... <q1TOP>3</q1TOP>
... </parent>
... </root>
... ''')
>>> root.xpath('.//*[translate(local-name(), '
... '"ABCDEFGHIJKLMNOPQRSTUVWXYZ", '
... '"abcdefghijklmnopqrstuvwxyz")="q1top"]')
[<Element Q1top at 0x7fd663354a28>,
<Element q1Top at 0x7fd663354830>,
<Element q1TOP at 0x7fd6633549e0>]
UPDATE
>>> root.xpath('.//*[translate(local-name(), '
... '"QTOP", '
... '"qtop")="q1top"]')
[<Element Q1top at 0x7fd663354a28>,
<Element q1Top at 0x7fd663354830>,
<Element q1TOP at 0x7fd6633549e0>]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.