Scraping Website using python to search for a specific thing

Question

Language: Python
Website: https://www.curseforge.com/minecraft/mc-mods/ae2-extras/files/3120250
Goal: get the project id and store it as a variable

Snippet from website

<div class="w-full flex justify-between">
    <span>Project ID</span>
    <span>421104</span>
</div>

I want to store the project id 421104 into a variable, I've tried using lxml to get all the divs with the classes 'w-full flex justify-between' but the result is empty

My code:

from lxml import html
import requests

page = requests.get(url)
doc = html.fromstring(page.content)
divs = doc.xpath("//div[@class='w-full flex justify-between']")
print(divs)

Output: []

What am I doing wrong? I have requests, and lxml installed in my environment
Then after I get the list off divs, how would i be able to scrape the 421104 from that first div and store it into a local variable

EDIT 2: I've solved it. Issue was the initial request was getting blocked by cloudfare, I posted my solution as an answer

Answer 1

Solution:

from lxml import html
import requests
import cloudscraper

scraper = cloudscraper.create_scraper()
page = scraper.get(url).text

doc = html.fromstring(page)
divs = doc.xpath("//div[@class='w-full flex justify-between']")
el = divs[0].text_content()
projectID = el.split()[-1]
print(projectID)

Answer 2

My be you got a response as <Response [403]> when you print(page) .Its mean The HTTP 403 is a HTTP status code meaning access to the requested resource is forbidden

Scraping Website using python to search for a specific thing

Question

2 answers

solution1
0 2020-12-07 08:28:20

solution2
0 2020-12-07 08:28:24

Scraping Website using python to search for a specific thing

Question

2 answers

solution1 0 2020-12-07 08:28:20

solution2 0 2020-12-07 08:28:24

solution1
0 2020-12-07 08:28:20

solution2
0 2020-12-07 08:28:24