How do I follow links (or scrape multiple links) when web scraping with urllib2?

Question

I am attempting to scrape the url ' http://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1 ' (for purely information purposes), but I cannot seem to figure out how to go to the next page. My current code is the following, but it just loops through the first page repeatedly instead of going to the next page.

import urllib2
from bs4 import BeautifulSoup

page_num = 1

while True:
    url = 'http://steamcommunity.com/market/search? q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p' + str(page_num)
    open_url = urllib2.urlopen(url).read()
    market_page = BeautifulSoup(read_url)

    for i in market_page('div', {'class' : 'market_listing_row      market_recent_listing_row market_listing_searchresult'}):
        item_name = i.find_all('span', {'class' : 'market_listing_item_name'})[0].get_text()
        price = i.find_all('span')[1].get_text()
        page_num += 1
        print  item_name + ' costs ' + price

EDIT: Also, the problem with the page I'm trying to scrape is that the links to the next page do not have any hrefs, so I was using a loop to try to go to different URLs, but it just scrapes the first URL repeatedly.

Answer 1

import urllib2
from bs4 import BeautifulSoup

pages  = 90

for page in range(pages):
    url = 'http://steamcommunity.com/market/search? q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p' + str(page)
    open_url = urllib2.urlopen(url).read()
    market_page = BeautifulSoup(read_url)

    for i in market_page('div', {'class' : 'market_listing_row      market_recent_listing_row market_listing_searchresult'}):
        item_name = i.find_all('span', {'class' : 'market_listing_item_name'})[0].get_text()
        price = i.find_all('span')[1].get_text()
        print  item_name + ' costs ' + price

How do I follow links (or scrape multiple links) when web scraping with urllib2?

Question

1 answers

solution1
1 2015-06-26 20:26:35

How do I follow links (or scrape multiple links) when web scraping with urllib2?

Question

1 answers

solution1 1 2015-06-26 20:26:35

solution1
1 2015-06-26 20:26:35