简体   繁体   中英

Web Scraping the Registration Reset Website

I am trying to get some perspective on web scraping this website. Essentially, what I am going to do is use the header keys as a way to scrape the data from the website and create a list of tuples, which I will convert into a data frame.

The issue is navigating to display different results and using a for loop to do so (example navigating from the first 50 results to the next 50 results.

What attribute, class, etc would I need to access so that I can iterate from tab to tab till the maximum number of rows is reached?

https://www6.sos.state.oh.us/ords/f?p=119:REGRESET:0 :

What happens is what classes are shown in the inspect element and real classes are different sometimes. Try to write the page as a binary file like:

import requests
html = requests.request("GET","https://www6.sos.state.oh.us/ords/f?p=119:REGRESET:0"
f = open("file.html", "w+")
f.write(str(html))
f.close()

Open the file in a browser and then inspect it, you will get the correct classes to scrape.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM