简体   繁体   English

如何加快此脚本的速度?

[英]How can I speed up this script?

I am required to retrieve 8000 answers from a website for research purposes (auto filling a form and submitting it 8000 times). 为了研究目的,我需要从一个网站上检索8000个答案(自动填写表格并提交8000次)。 I wrote the below script but when I run it after 20 submits python stops working and I'm unable to get what I need. 我写了下面的脚本,但是当我在20个提交后运行它时,python停止工作,我无法获得所需的东西。 Could you please help me find the problem with my script? 您能帮我找到脚本问题吗?

from mechanize import ParseResponse, urlopen, urljoin
import urllib2
from urllib2 import Request, urlopen, URLError
import mechanize
import time

URL = "url of the website"
br = mechanize.Browser() # Creates a browser
br.set_handle_robots(False) # ignore robots
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

def fetch(val):
    br.open(URL) # Open the login page
    br.select_form(nr=0) # Find the login form
    br['subject']='question'
    br['value'] =val
    br.set_all_readonly(False)
    resp = br.submit()
    data = resp.read()
    br.reload()
    x=data.find("the answer is:")
    if x!=-1:
        ur=data[x:x+100]
        print ur

val_list =val_list # This list is available and contains 8000 different values

for i in range(0,8000):
    fetch(val_list[i])

Having used mechanize in the past to do a similar data-scraping kind of thing, you're almost certainly getting limited by the website as Erbureth mentioned. 过去曾经使用mechanize来做类似的数据刮擦之类的事情,但是您几乎肯定会像Erbureth所提到的那样受到网站的限制。 Usually websites have a way to monitor connections to filter out exactly the type of thing you're attempting, and for good reason. 通常,网站有一种方法可以监视连接,以完全过滤出您所尝试的事物的类型,这是有充分理由的。

Putting aside for a moment whatever the purpose of your script may be and moving to your question of why is doesn't work: At the very least, I would put some delays in there so you're not trying to access the site repeatedly in such a short time span. 暂时搁置一下脚本的目的是什么,然后转向为什么不起作用的问题:至少,我会在其中延迟一些时间,因此您不会尝试在其中反复访问该网站。这么短的时间。 Put a few seconds of pause between calls, and maybe it will work. 在通话之间放置几秒钟的暂停,也许会起作用。 (Although then you'll have to let it run for hours.) (尽管那样,您必须让它运行几个小时。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM