Language: Python
Website: https://www.curseforge.com/minecraft/mc-mods/ae2-extras/files/3120250
Goal: get the project id and store it as a variable
Snippet from website
<div class="w-full flex justify-between">
<span>Project ID</span>
<span>421104</span>
</div>
I want to store the project id 421104 into a variable, I've tried using lxml to get all the divs with the classes 'w-full flex justify-between' but the result is empty
My code:
from lxml import html
import requests
page = requests.get(url)
doc = html.fromstring(page.content)
divs = doc.xpath("//div[@class='w-full flex justify-between']")
print(divs)
Output: []
What am I doing wrong? I have requests, and lxml installed in my environment
Then after I get the list off divs, how would i be able to scrape the 421104 from that first div and store it into a local variable
EDIT 2: I've solved it. Issue was the initial request was getting blocked by cloudfare, I posted my solution as an answer
Solution:
from lxml import html
import requests
import cloudscraper
scraper = cloudscraper.create_scraper()
page = scraper.get(url).text
doc = html.fromstring(page)
divs = doc.xpath("//div[@class='w-full flex justify-between']")
el = divs[0].text_content()
projectID = el.split()[-1]
print(projectID)
My be you got a response as <Response [403]>
when you print(page)
.Its mean The HTTP 403 is a HTTP status code meaning access to the requested resource is forbidden
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.