[英]How can I iterate through the pages of a website using Python?
I'm new to software development, and I'm not sure how to go about this. 我是软件开发的新手,我不知道如何解决这个问题。 I want to visit every page of a website and grab a specific bit of data from each one.
我想访问网站的每个页面,并从每个页面获取一些特定的数据。 My problem is, I don't know how to iterate through all of the existing pages without knowing the individual urls ahead of time.
我的问题是,我不知道如何在不知道个人网址的情况下迭代所有现有页面。 For example, I want to visit every page whose url starts with
例如,我想访问其url开头的每个页面
"http://stackoverflow.com/questions/"
“http://stackoverflow.com/questions/”
Is there a way to compile a list and then iterate through that, or is it possible to do this without creating a giant list of urls? 有没有办法编译列表,然后迭代,或者是否可以这样做而不创建一个巨大的网址列表?
To grab a specific bit of data from a web site you could use some web scraping tool eg, scrapy . 要从网站获取特定数据,您可以使用一些网络抓取工具,例如scrapy 。
If required data is generated by javascript then you might need browser-like tool such as Selenium WebDriver and implement crawling of the links by hand. 如果需要的数据是由javascript生成的,那么您可能需要类似浏览器的工具,例如Selenium WebDriver,并手动实现链接的抓取。
For example, you can make a simple for loop, like this: 例如,您可以创建一个简单的for循环,如下所示:
def webIterate():
base_link = "http://stackoverflow.com/questions/"
for i in xrange(24):
print "http://stackoverflow.com/questions/%d" % (i)
The output will be: 输出将是:
http://stackoverflow.com/questions/0
http://stackoverflow.com/questions/2
http://stackoverflow.com/questions/3
...
http://stackoverflow.com/questions/23
It's just an example. 这只是一个例子。 You can pass numbers of questions and make with them whatever you want
你可以传递许多问题并随心所欲地制作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.