简体   繁体   中英

Using xpath with python lxml to query html

I am reading a html page as a string and use tree = html.fromstring(data)

I now want to use lxml xpath to query. Below are an example of the part i am interested in.

<table class="class">
 <tbody>
  <tr>
   <th class="classTh">
    Overall
   </th>
   <td class="classTd">
    <span class="classSpan">
     GREEN
    </span>
   </td>
  </tr>
 </tbody>
</table>

with the call

 xpath = '//table/tbody/tr[th="Overall"]/td/span'
 e = tree.xpath(xpath)
  for i in e:
   print(i.text)

I am using xpath to get the data i need. But i cannot get the xpath to work. Using this exact code + xpath in any online tester works for me.

I have tried with xpath:

xpath = '//table/tbody/tr[th]/td/span'

which gets me all elements instead of the ones with the correct filter value.

 xpath ='//table/tbody/tr[td/span]/th'

gets me all the filter values.

So my question. How to i apply the text value filter in my xpath correctly?

The syntax for this xpath in lxml is the following:

xpath = "//table/tbody/tr[th[contains(text(), 'Overall')]]/td/span"

Which solved my problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM