简体   繁体   English

在无限循环中停止python脚本

[英]Stop python script in infinite loop

I'm working on a Python script that will constantly scrape data, but it will take quite a long time. 我正在研究一个不断抓取数据的Python脚本,但这需要相当长的时间。 Is there a safe way to stop a long running python script? 有没有一种安全的方法来阻止长时间运行的python脚本? The loop will run for more than 10 minutes and I need a way to stop it if I want, after it's already running. 循环将运行超过10分钟,我需要一种方法来阻止它,如果我想要,它已经运行。

If I execute it from a cron job, then I'm assuming it'll just run until it's finished, so how do I stop it? 如果我从一个cron作业执行它,那么我假设它只是运行直到它完成,所以我该如何阻止它?

Also, if I run it from a browser and just call the file. 此外,如果我从浏览器运行它,只需调用该文件。 I'm assuming stopping the page from loading would halt it, correct? 我假设停止加载页面会停止它,对吗?


Here's the scenario: 这是场景:
I have one python script that is gather info from pages and put it into a queue. 我有一个python脚本,它从页面收集信息并将其放入队列。 Then I want to have another python script that is in an infinite loop that just checks for new items in the queue. 然后我想要另一个处于无限循环中的python脚本,它只检查队列中的新项目。 Lets say I want the infinite loop to begin at 8am and end at 8pm. 让我们说我希望无限循环从早上8点开始到晚上8点结束。 How do I accomplish this? 我该如何做到这一点?

Let me present you an alternative. 让我给你一个替代方案。 It looks like you want real-time updates for some kind of information. 看起来您想要某种信息的实时更新。 You could use a pub/sub interface (publish/subscribe). 您可以使用pub / sub接口(发布/订阅)。 Since you are using python, there are plenty of possibilities. 由于您使用的是python,因此有很多可能性。

One of them is using Redis pub/sub functionality: http://redis.io/topics/pubsub/ - and here is the corresponding python module: redis-py 其中一个是使用Redis pub / sub功能: http//redis.io/topics/pubsub/ - 这里是相应的python模块: redis-py

- Update - - 更新 -

Example

Here is an example from dirkk0 ( question / answer ): 以下是dirkk0问题 / 答案 )的示例:

import sys
import threading

import cmd


def monitor():
    r = redis.Redis(YOURHOST, YOURPORT, YOURPASSWORD, db=0)

    channel = sys.argv[1]
    p = r.pubsub()

    p.subscribe(channel)

    print 'monitoring channel', channel
    for m in p.listen():
        print m['data']


class my_cmd(cmd.Cmd):
    """Simple command processor example."""

    def do_start(self, line):
        my_thread.start()

    def do_EOF(self, line):
        return True

if __name__ == '__main__':
    if len(sys.argv) == 1:
        print "missing argument! please provide the channel name."
    else:
        my_thread = threading.Thread(target=monitor)
        my_thread.setDaemon(True)

        my_cmd().cmdloop()

- Update 2 - - 更新2 -

In addition, look at this tutorial: 另外,看看这个教程:

http://blog.abourget.net/2011/3/31/new-and-hot-part-6-redis-publish-and-subscribe/ http://blog.abourget.net/2011/3/31/new-and-hot-part-6-redis-publish-and-subscribe/

I guess one way to work around the issue is having a script for one loop run, that would: 我想解决这个问题的一种方法是为一个循环运行一个脚本,它将:

  1. check no other instance of the script is running 检查脚本的其他实例是否正在运行
  2. look into the queue and process everything found there 查看队列并处理那里发现的所有内容

Now, then you can run this script from cron every minute between 8 am and 8 pm The only downside is that new items may some time to get processed. 现在,您可以在上午8点到晚上8点之间每分钟从cron运行此脚本。唯一的缺点是新项目可能需要一段时间才能得到处理。

i think holding browser page does not necessarily stop the python script, I suggest that you start your script under control of a parent process using FORK: 我认为持有浏览器页面不一定会停止python脚本,我建议你使用FORK在父进程的控制下启动你的脚本:

  • Example : 示例:

import os, time, signal 导入os,时间,信号

def child():
   print 'A new child ',  os.getpid( )
   time.sleep(5)
   os._exit(0)  

def parent():
   while True:
      newpid = os.fork()
      if newpid == 0:
         child()
      else:
         pids = (os.getpid(), newpid)
         print "parent: %d, child: %d" % pids
         print "start counting time for child process...!"
         time1 = time.clock()
         while True:
                  #time.sleep(1)
                  time2 = time.clock()
                  # Check if the execution time for child process exceeds 10 minutes... 
                  if time2-time1 >= 2 :
                           os.kill(int(newpid), signal.SIGKILL)
                           break

      if raw_input( ) == 'q': break

parent()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM