简体   繁体   English

网页抓取注册重置网站

[英]Web Scraping the Registration Reset Website

I am trying to get some perspective on web scraping this website.我试图对网络抓取这个网站有所了解。 Essentially, what I am going to do is use the header keys as a way to scrape the data from the website and create a list of tuples, which I will convert into a data frame.本质上,我要做的是使用标题键作为从网站上抓取数据并创建元组列表的一种方式,我会将其转换为数据框。

The issue is navigating to display different results and using a for loop to do so (example navigating from the first 50 results to the next 50 results.问题是导航以显示不同的结果并使用 for 循环来执行此操作(例如从前 50 个结果导航到接下来的 50 个结果。

What attribute, class, etc would I need to access so that I can iterate from tab to tab till the maximum number of rows is reached?我需要访问什么属性、类等,以便我可以从一个选项卡到另一个选项卡进行迭代,直到达到最大行数?

https://www6.sos.state.oh.us/ords/f?p=119:REGRESET:0 : https://www6.sos.state.oh.us/ords/f?p=119:REGRESET:0

What happens is what classes are shown in the inspect element and real classes are different sometimes.会发生什么是检查元素中显示的类,有时实际类是不同的。 Try to write the page as a binary file like:尝试将页面编写为二进制文件,例如:

import requests
html = requests.request("GET","https://www6.sos.state.oh.us/ords/f?p=119:REGRESET:0"
f = open("file.html", "w+")
f.write(str(html))
f.close()

Open the file in a browser and then inspect it, you will get the correct classes to scrape.在浏览器中打开文件,然后检查它,您将获得要抓取的正确类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM