Python, How to use lxml XPath?

Question

In python I had:

response = s.get(url, allow_redirects=False, cookies=cookies, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
reg_cart = soup.find('form', attrs={"name": "regCart"})
registered_courses = [i.a.text for i in reg_cart.find_all('div', attrs={"class": "course-number"})]

Now I want to replace BeautifulSoup with lxml , reading this:

https://timber.io/blog/an-intro-to-web-scraping-with-lxml-and-python/

I tried to implement what they used there and got:

import lxml.html
doc = lxml.html.fromstring(response.content)
registered_courses = doc.xpath('//div[@class="course-number"]/text()')

But for some reason my output is:

['\n\t\t\t\t\t', '\n\t\t\t\t', '\n\t\t\t\t\t', '\n\t\t\t\t', '\n\t\t\t\t\t', '\n\t\t\t\t', '\n\t\t\t\t\t']

While previously it correctly showed courses numbers.

How can I fix this? plus how can I edit my path to return only those div tags under the form regCart and not in all response?

For example the html code looks something like:

        <form name="regCart" ....>
        </div><div class="entry-spacer"></div><div class="cart-entry">
                <div class="course-number">
                <a href="https://university.com/rishum/course/236756">236756</a>
            </div>
            <div class="course-name">
                מבוא למערכות לומדות              
            </div>
            <div class="course-points">
                3.0 נק'
            </div>
            <div class="entry-group">
                קבוצה 13
            </div>

Where I want to return 236756

Answer 1

The issue is in your relative xpath: //div[@class="course-number"]/text()

<div class="course-number">
  <a href="https://university.com/rishum/course/236756">236756</a>
</div>

This would grab the text field under the corresponding div; however, there is no text under the div. The text field of interest is actually inside the tag, and the correct relative xpath is: //div[@class="course-number"]/a/text()

Python, How to use lxml XPath?

Question

1 answers

solution1
0 ACCPTED 2021-01-19 23:43:11

Python, How to use lxml XPath?

Question

1 answers

solution1 0 ACCPTED 2021-01-19 23:43:11

solution1
0 ACCPTED 2021-01-19 23:43:11