[英]Stop python script in infinite loop
I'm working on a Python script that will constantly scrape data, but it will take quite a long time. 我正在研究一个不断抓取数据的Python脚本,但这需要相当长的时间。 Is there a safe way to stop a long running python script? 有没有一种安全的方法来阻止长时间运行的python脚本? The loop will run for more than 10 minutes and I need a way to stop it if I want, after it's already running. 循环将运行超过10分钟,我需要一种方法来阻止它,如果我想要,它已经运行。
If I execute it from a cron job, then I'm assuming it'll just run until it's finished, so how do I stop it? 如果我从一个cron作业执行它,那么我假设它只是运行直到它完成,所以我该如何阻止它?
Also, if I run it from a browser and just call the file. 此外,如果我从浏览器运行它,只需调用该文件。 I'm assuming stopping the page from loading would halt it, correct? 我假设停止加载页面会停止它,对吗?
Here's the scenario: 这是场景:
I have one python script that is gather info from pages and put it into a queue. 我有一个python脚本,它从页面收集信息并将其放入队列。 Then I want to have another python script that is in an infinite loop that just checks for new items in the queue. 然后我想要另一个处于无限循环中的python脚本,它只检查队列中的新项目。 Lets say I want the infinite loop to begin at 8am and end at 8pm. 让我们说我希望无限循环从早上8点开始到晚上8点结束。 How do I accomplish this? 我该如何做到这一点?
Let me present you an alternative. 让我给你一个替代方案。 It looks like you want real-time updates for some kind of information. 看起来您想要某种信息的实时更新。 You could use a pub/sub interface (publish/subscribe). 您可以使用pub / sub接口(发布/订阅)。 Since you are using python, there are plenty of possibilities. 由于您使用的是python,因此有很多可能性。
One of them is using Redis pub/sub functionality: http://redis.io/topics/pubsub/ - and here is the corresponding python module: redis-py 其中一个是使用Redis pub / sub功能: http : //redis.io/topics/pubsub/ - 这里是相应的python模块: redis-py
- Update - - 更新 -
Here is an example from dirkk0 ( question / answer ): 以下是dirkk0 ( 问题 / 答案 )的示例:
import sys
import threading
import cmd
def monitor():
r = redis.Redis(YOURHOST, YOURPORT, YOURPASSWORD, db=0)
channel = sys.argv[1]
p = r.pubsub()
p.subscribe(channel)
print 'monitoring channel', channel
for m in p.listen():
print m['data']
class my_cmd(cmd.Cmd):
"""Simple command processor example."""
def do_start(self, line):
my_thread.start()
def do_EOF(self, line):
return True
if __name__ == '__main__':
if len(sys.argv) == 1:
print "missing argument! please provide the channel name."
else:
my_thread = threading.Thread(target=monitor)
my_thread.setDaemon(True)
my_cmd().cmdloop()
- Update 2 - - 更新2 -
In addition, look at this tutorial: 另外,看看这个教程:
http://blog.abourget.net/2011/3/31/new-and-hot-part-6-redis-publish-and-subscribe/ http://blog.abourget.net/2011/3/31/new-and-hot-part-6-redis-publish-and-subscribe/
I guess one way to work around the issue is having a script for one loop run, that would: 我想解决这个问题的一种方法是为一个循环运行一个脚本,它将:
Now, then you can run this script from cron every minute between 8 am and 8 pm The only downside is that new items may some time to get processed. 现在,您可以在上午8点到晚上8点之间每分钟从cron运行此脚本。唯一的缺点是新项目可能需要一段时间才能得到处理。
i think holding browser page does not necessarily stop the python script, I suggest that you start your script under control of a parent process using FORK: 我认为持有浏览器页面不一定会停止python脚本,我建议你使用FORK在父进程的控制下启动你的脚本:
import os, time, signal 导入os,时间,信号
def child():
print 'A new child ', os.getpid( )
time.sleep(5)
os._exit(0)
def parent():
while True:
newpid = os.fork()
if newpid == 0:
child()
else:
pids = (os.getpid(), newpid)
print "parent: %d, child: %d" % pids
print "start counting time for child process...!"
time1 = time.clock()
while True:
#time.sleep(1)
time2 = time.clock()
# Check if the execution time for child process exceeds 10 minutes...
if time2-time1 >= 2 :
os.kill(int(newpid), signal.SIGKILL)
break
if raw_input( ) == 'q': break
parent()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.