Using xpath with python lxml to query html

Question

I am reading a html page as a string and use tree = html.fromstring(data)

I now want to use lxml xpath to query. Below are an example of the part i am interested in.

<table class="class">
 <tbody>
  <tr>
   <th class="classTh">
    Overall
   </th>
   <td class="classTd">
    <span class="classSpan">
     GREEN
    </span>
   </td>
  </tr>
 </tbody>
</table>

with the call

 xpath = '//table/tbody/tr[th="Overall"]/td/span'
 e = tree.xpath(xpath)
  for i in e:
   print(i.text)

I am using xpath to get the data i need. But i cannot get the xpath to work. Using this exact code + xpath in any online tester works for me.

I have tried with xpath:

xpath = '//table/tbody/tr[th]/td/span'

which gets me all elements instead of the ones with the correct filter value.

 xpath ='//table/tbody/tr[td/span]/th'

gets me all the filter values.

So my question. How to i apply the text value filter in my xpath correctly?

Answer 1

The syntax for this xpath in lxml is the following:

xpath = "//table/tbody/tr[th[contains(text(), 'Overall')]]/td/span"

Which solved my problem.

Using xpath with python lxml to query html

Question

1 answers

solution1
1 2017-07-05 14:12:34

Using xpath with python lxml to query html

Question

1 answers

solution1 1 2017-07-05 14:12:34

solution1
1 2017-07-05 14:12:34