简体   繁体   中英

Stop python script in infinite loop

I'm working on a Python script that will constantly scrape data, but it will take quite a long time. Is there a safe way to stop a long running python script? The loop will run for more than 10 minutes and I need a way to stop it if I want, after it's already running.

If I execute it from a cron job, then I'm assuming it'll just run until it's finished, so how do I stop it?

Also, if I run it from a browser and just call the file. I'm assuming stopping the page from loading would halt it, correct?


Here's the scenario:
I have one python script that is gather info from pages and put it into a queue. Then I want to have another python script that is in an infinite loop that just checks for new items in the queue. Lets say I want the infinite loop to begin at 8am and end at 8pm. How do I accomplish this?

Let me present you an alternative. It looks like you want real-time updates for some kind of information. You could use a pub/sub interface (publish/subscribe). Since you are using python, there are plenty of possibilities.

One of them is using Redis pub/sub functionality: http://redis.io/topics/pubsub/ - and here is the corresponding python module: redis-py

- Update -

Example

Here is an example from dirkk0 ( question / answer ):

import sys
import threading

import cmd


def monitor():
    r = redis.Redis(YOURHOST, YOURPORT, YOURPASSWORD, db=0)

    channel = sys.argv[1]
    p = r.pubsub()

    p.subscribe(channel)

    print 'monitoring channel', channel
    for m in p.listen():
        print m['data']


class my_cmd(cmd.Cmd):
    """Simple command processor example."""

    def do_start(self, line):
        my_thread.start()

    def do_EOF(self, line):
        return True

if __name__ == '__main__':
    if len(sys.argv) == 1:
        print "missing argument! please provide the channel name."
    else:
        my_thread = threading.Thread(target=monitor)
        my_thread.setDaemon(True)

        my_cmd().cmdloop()

- Update 2 -

In addition, look at this tutorial:

http://blog.abourget.net/2011/3/31/new-and-hot-part-6-redis-publish-and-subscribe/

I guess one way to work around the issue is having a script for one loop run, that would:

  1. check no other instance of the script is running
  2. look into the queue and process everything found there

Now, then you can run this script from cron every minute between 8 am and 8 pm The only downside is that new items may some time to get processed.

i think holding browser page does not necessarily stop the python script, I suggest that you start your script under control of a parent process using FORK:

  • Example :

import os, time, signal

def child():
   print 'A new child ',  os.getpid( )
   time.sleep(5)
   os._exit(0)  

def parent():
   while True:
      newpid = os.fork()
      if newpid == 0:
         child()
      else:
         pids = (os.getpid(), newpid)
         print "parent: %d, child: %d" % pids
         print "start counting time for child process...!"
         time1 = time.clock()
         while True:
                  #time.sleep(1)
                  time2 = time.clock()
                  # Check if the execution time for child process exceeds 10 minutes... 
                  if time2-time1 >= 2 :
                           os.kill(int(newpid), signal.SIGKILL)
                           break

      if raw_input( ) == 'q': break

parent()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM