简体   繁体   中英

XPath parent node's class should not contain specific string

I'm trying to find all div s whose class name is 'phrase' and parent node's class name is not 'extras'.

So in Python I'm using

for phrase in entry.iterfind(".//div[@class='phrase'] and ./parent::div[@class!='extras']]"):

to do that.

But it gives me the error:

SyntaxError: prefix 'parent' not found in prefix map

And I changed the above code to

for phrase in entry.iterfind(".//div[@class='phrase'] and ./..[@class!='extras']]"):

This time the error was

Traceback (most recent call last):File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementPath.py", line 272, in iterfind
selector = _cache[cache_key] KeyError: (".//div[@class='phrase'] and ./..[@class!='extras']]", None)

Part of the XML structures are as follows:

<div class="phrases">
    <div class="label">Phrases</div>
    <div class="phrase">
    ……

<div class="phrasal verbs">
    <div class="label">Phrases</div>
    <div class="phrase">
    ……

<div class="extras">
    <h2>test test</h2>
    <div class="phrase">
    ……

I'm using Python 3.7 and xml.etree library on Mac OS 10.14.

Problem might be in your current tool as it might not support some XPath syntax.

You can try lxml.html to parse the same HTML-doc:

from lxml import html

source = """<div class="phrases">
                <div class="label">Phrases</div>
                <div class="phrase">this</div>
            </div>

            <div class="phrasal verbs">
                <div class="label">Phrases</div>
                <div class="phrase">this</div>
            </div>

            <div class="extras">
                <h2>test test</h2>
                <div class="phrase">not this</div>
            </div>"""

dom = html.fromstring(source)
dom.xpath(".//div[@class='phrase' and ./parent::div[@class!='extras']]")

Output:

[<Element div at 0x7fb5218d5db8>, <Element div at 0x7fb521018728>] #  Two elements found

or

dom.xpath(".//div[@class='phrase' and ./parent::div[@class!='extras']]/text()")

Output:

['this', 'this']

你可以使用类似"//div[@class!='extras']/div[@class='phrase']"东西,它应该找到所有类'phrase'的div,其中父类不是'extras'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM