简体   繁体   中英

Python lxml.html XPath “attribute not equal” operator not working as expected

I'm trying to run the following script:

#!python

from urllib import urlopen #urllib.request for python3
from lxml import html

url =   'http://mpk.lodz.pl/rozklady/1_11_D2D3/00d2/00d2t001.htm?r=KOZINY'+\
        '%20-%20Srebrzy%F1ska,%20Cmentarna,%20Legion%F3w,%20pl.%20Wolno%B6ci'+\
        ',%20Pomorska,%20Kili%F1skiego,%20Przybyszewskiego%20-%20LODOWA'

raw_html = urlopen(url).read()
tree = html.fromstring(raw_html) #need to .decode('windows-1250') in python3
ret = tree.xpath('//td [@class!="naglczas"]')
print ret
assert(len(ret)==1)

I expect it to select the one td that doesn't have its class set to 'naglczas'. Instead, it returns me an empty list. Why is that? I guess there's some silly reason, but I tried googling and found nothing that would explain it.

Your xpath expression will find

a td element that has a class which is not "naglczas"

You seem to want(since the only 3 td-s with a class have the same class you don't want)

a td element which does not have a class of "naglczas"


Those might sound similar, but they are different. Something like

tree.xpath('//td[not(@class="naglczas")]')

should get you what you want.


Also, you don't need to use urllib to open the url, lxml can do that for you, using lxml.html.parse() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM