简体   繁体   中英

SyntaxError: invalid predicate using lxml iterfind

I am currently struggling with iterating through a XPath expression. I am trying to retrieve all the system-out nodes that contains a substring of "[[SOMETHING|". The issue is that I get the following syntax error that points to the tree.iterfind.

    for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 2288, in lxml.etree._ElementTree.iterfind
  File "src/lxml/etree.pyx", line 1588, in lxml.etree._Element.iterfind
  File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
  File "src/lxml/_elementpath.py", line 295, in lxml._elementpath._build_path_iterator
  File "src/lxml/_elementpath.py", line 237, in lxml._elementpath.prepare_predicate
SyntaxError: invalid predicate
tree = etree.parse(test_file)
for elem in tree.iterfind('.//system-out[contains(.,"[[SOMETHING|")]'):
     print("do something")

The above is my code. As far as I can see I don't have any syntax error. And I have also tried to test the xpath expression using a free formatter tool, and that seems to work. I just can't seem to see what is wrong. I have attempted to use the "findall" function provided by lxml but I receive the same error. I have also tried to compile the xpath expression using the etree.XPath function into an attribute, however I received an TypeError that says the following, which makes sense.

TypeError: 'lxml.etree.XPath' object is unsliceable

Is there something I am missing? Or is just an unsupported expression by the lxml package itself?

In case SOMETHING instead of [[SOMETHING| still can be used and will be a unique enough I'd suggest instead of this .//system-out[contains(.,"[[SOMETHING|")] to use just this:

'.//system-out[contains(.,"SOMETHING")]'

So the entire code line will be

for elem in tree.iterfind('.//system-out[contains(.,"SOMETHING")]'):

As Martin Honnen explained in the comments, the find methods ( iterfind , find , findall ) in ElementTree and lxml does not support the full XPath 1.0 syntax which explains the SyntaxError: invalid predicate error.

I used the lxml.etree.xpath() function instead, which does support the XPath 1.0 syntax. Being able to retrieve the text in the XML file I then used the result of the xpath() function to iterate over all of the occurrences by using a much simpler XPath expression that iterfind can understand.

occ = tree.xpath('.//system-out[contains(.,"[[SOMETHING|")]')[0].text
for elem in tree.iterfind(f'.//*[.="{occ}"]'):
     print("do something")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM