简体   繁体   中英

Scraping data from two different pages having same url

I am trying to scrap data from this site

http://www.professorpaddle.com/rivers/riverlist.asp

For different states the url is same.For example washington page and oregon page have same url.How to write a single script to scrap data for each state based on user's choice in python?

In this case, the data is created dynamically on the page. So you should do some post requests to get the data from the server. You can do that you using requests . If you use Firefox or Google Chrome you can use the inspect tool to find the kind of requests the page's javascript do. In this specific case, you can get the data this way:

import requests

# for Washington
data = requests.post("http://www.professorpaddle.com/rivers/riverlist.asp", data={"hstateid":13}).text 

To get all data:

all_data = []
for state in range(65): # I got this range manually 
    data = requests.post("http://www.professorpaddle.com/rivers/riverlist.asp", data={"hstateid":state}).text
    all_data.append(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM