How to find xpath elements/tags of a specific node

Question

I have a page with the following structure...

<doc>
  <tbody>
   .
   .
   .
  <tbody>
    <tr>
       <td>
       .
       .
  </tbody>
  ....
</doc>

I'm able to get to the specific table I want with the xpath

response.xpath('//tbody')[8].get()

but I'm struggling with the syntax to get elements/tags within tbody[8]... so far I've tried

>>> response.xpath('//tbody')[8]/tr.get()
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'tr' is not defined

along with several other attempts but they all fail due to (I believe) syntax. How can I get to tr and td tags inside tbody? No matter what I try I can't seem to add anything after tbody')[8] & I can't wrap my head around why...

Answer 1

You're on the right track, but you're going to need to provide the whole xpath string as an argument to the xpath() function, rather than trying to stick pieces of it outside.

The response.xpath('//tbody') is returning a list of elements matched by the xpath, and the [8] you have there is a Python index operator, not part of the xpath. But then you're trying to continue writing an xpath after it, and it's just gibberish to Python.

If you take a look at some of the examples in https://docs.scrapy.org/en/latest/topics/selectors.html , you should be able to see what you're doing wrong.

Answer 2

Your /tr supposed to go in the same XPath string:

response.xpath('//tbody[9]/tr').get()

Also note that despite XPath supports indexing like python, XPath index starts from 1 instead of 0 . So if you could get the correct element using index [8] in python, you may want to use index [9] in the XPath expression

How to find xpath elements/tags of a specific node

Question

2 answers

solution1
0 2021-03-19 22:17:24

solution2
0 2021-03-20 04:10:41

How to find xpath elements/tags of a specific node

Question

2 answers

solution1 0 2021-03-19 22:17:24

solution2 0 2021-03-20 04:10:41

solution1
0 2021-03-19 22:17:24

solution2
0 2021-03-20 04:10:41