I want to practice how to parse values from website. However, when I parse the comments from Steam, I only can parse the first page of comment. How do I crawl all the comments?
Here is my code:
from bs4 import BeautifulSoup
import urllib.request
url = 'http://steamcommunity.com/games/dota2/announcements/detail/1449457773770927103'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'lxml')
for t in soup.body.find_all('div', attrs = {'class':'commentthread_comment_text'}):
print(t.text)
If you open up your dev console, click on network, then click on the next button, you'll see that the page is making a request to the following url:
https://steamcommunity.com/comment/ClanAnnouncement/render/103582791433224455/1449457773770927103/
EDIT:
In the response body you'll see the following 3 properties: start
, pagesize
, total_count
. If you keep attaching query parameters, you'll be able to fetch all comments: https://steamcommunity.com/comment/ClanAnnouncement/render/103582791433224455/1449457773770927103/?start=10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.