简体   繁体   中英

Python - parsing comments from Steam

I want to practice how to parse values from website. However, when I parse the comments from Steam, I only can parse the first page of comment. How do I crawl all the comments?

Here is my code:

from bs4 import BeautifulSoup
import urllib.request

url = 'http://steamcommunity.com/games/dota2/announcements/detail/1449457773770927103'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'lxml')
for t in soup.body.find_all('div', attrs = {'class':'commentthread_comment_text'}):    
    print(t.text)

If you open up your dev console, click on network, then click on the next button, you'll see that the page is making a request to the following url:

https://steamcommunity.com/comment/ClanAnnouncement/render/103582791433224455/1449457773770927103/

EDIT:

In the response body you'll see the following 3 properties: start , pagesize , total_count . If you keep attaching query parameters, you'll be able to fetch all comments: https://steamcommunity.com/comment/ClanAnnouncement/render/103582791433224455/1449457773770927103/?start=10

https://steamcommunity.com/comment/ClanAnnouncement/render/103582791433224455/1449457773770927103/?start=20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM