[英]Python: Restrict the code to be run for an hour
I have written a scraper that does html scraping and then use API to get some data, since its a very lengthy code I haven't put it here. 我写了一个刮板,它做了html抓取,然后使用API来获取一些数据,因为它是一个非常冗长的代码,我没有把它放在这里。 I have implemented random sleep method and using it within my code to monitor throttle.
我已经实现了随机睡眠方法并在我的代码中使用它来监控油门。 But I want to make sure I don't over run this code, so my idea is to run for an 3-4 hours then taker breather and then run again.
但是我想确保我不会过度使用这个代码,所以我的想法是运行3-4个小时然后接受呼吸,然后再次运行。 I haven't done anything like this in python I was trying to search but not really sure where to start from, it would be great if I get some guidance on this.
我没有在python中做过这样的事情,我试图搜索但不确定从哪里开始,如果我得到一些指导就会很好。 If python has a specific module link to that would be a great help.
如果python有一个特定的模块链接,那将是一个很大的帮助。
Also is this relevant? 这也是相关的吗? I don't I need this level of complication?
我不需要这种程度的并发症吗?
Suggestions for a Cron like scheduler in Python? 在Python中建议像Cron一样的调度程序?
I have functions for every single scraping task, and I have main method calling all those functions. 我有每个单一抓取任务的函数,我有main方法调用所有这些函数。
You could just note the time you have started and each time you want to run something make sure you haven't exceeded the given maximum. 你可以记下你开始的时间,每次你想要运行的东西,确保你没有超过给定的最大值。 Something like this should get you started:
这样的事情应该让你开始:
from datetime import datetime
MAX_SECONDS = 3600
# note the time you have started
start = datetime.now()
while True:
current = datetime.now()
diff = current-start
if diff.seconds >= MAX_SECONDS:
# break the loop after MAX_SECONDS
break
# MAX_SECONDS not exceeded, run more tasks
scrape_some_more()
Here's the link to the datetime module documentation . 这是datetime模块文档的链接 。
You can use a threading.Timer object to schedule an interrupt signal to the main thread after the time is exceeded: 您可以使用threading.Timer对象在超过时间后为主线程调度中断信号:
import thread, threading
def longjob():
try:
# do your job
while True:
print '*',
except KeyboardInterrupt:
# do your cleanup
print 'ok, giving up'
def terminate():
print 'sorry, pal'
thread.interrupt_main()
time_limit = 5 # terminate in 5 seconds
threading.Timer(time_limit, terminate).start()
longjob()
Put this in your crontab and run every time_limit
+ 2 minutes. 把它放在你的crontab中,每隔
time_limit
+ 2分钟运行一次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.