Python Requests-html not return the page content

Question

I'm new to Python and would like your advice for the issue I've encountered recently. I'm doing a small project where I tried to scrape a comic website to download a chapter (pictures). However, when printing out the page content for testing (because i tried to use Beautifulsoup.select() and got no result), it only showed a line of html:

'document.cookie="VinaHost-Shield=a7a00919549a80aa44d5e1df8a26ae20"+"; path=/";window.location.reload(true);'

Any help would be really appreciated.

from requests_html import HTMLSession
session = HTMLSession()

res = session.get("https://truyenqqpro.com/truyen-tranh/dao-hai-tac-128-chap-1060.html")
res.html.render()
print(res.content)

I also tried this but the resutl was the same.

import requests, bs4

url = "https://truyenqqpro.com/truyen-tranh/dao-hai-tac-128-chap-1060.html"
res = requests.get(url, headers={"User-Agent": "Requests"})
res.raise_for_status()
# soup = bs4.BeautifulSoup(res.text, "html.parser")
# onePiece = soup.select(".page-chapter")
print(res.content)

Answer 1

import urllib.request
request_url = urllib.request.urlopen('https://truyenqqpro.com/truyen-tranh/dao-hai-tac-128-chap-1060.html')
print(request_url.read())

it will return html code of the page. by the way in that html it is loading several images. you need to use regx to trakdown those img urls and download them.

Answer 2

This response means that we need a javascript render that reload the page using this cookie. for you get the content some workaround must be added.

I commonly use splash scrapinhub render engine and putting a sleep in the page just renders ok all the content. Some tools that render in same way are selenium for python or pupitter in JS.

Link for Splash and Pupeteer

Python Requests-html not return the page content

Question

2 answers

solution1
1 2022-09-21 12:05:48

solution2
0 2022-09-21 20:28:41

Python Requests-html not return the page content

Question

2 answers

solution1 1 2022-09-21 12:05:48

solution2 0 2022-09-21 20:28:41

solution1
1 2022-09-21 12:05:48

solution2
0 2022-09-21 20:28:41