简体   繁体   中英

Python closing multiple Threads

I am using the example below from IBM website. I noticed that the threads for DatamineThread() and ThreadUrl() remain open because of the while-loops.

I am trying to terminate these threads and print text telling me so. I am not sure if I am going about this the correct way or even if the threads need to be terminated this way. The problem is when I set run = False in main() the while-loops are reading run = True.

Any help would be great... Thanks

import Queue
import threading
import urllib2
import time
from BeautifulSoup import BeautifulSoup

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
        "http://ibm.com", "http://apple.com"]

queue = Queue.Queue()
out_queue = Queue.Queue()
run = True

class ThreadUrl(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, queue, out_queue):
        threading.Thread.__init__(self)
        self.queue = queue
        self.out_queue = out_queue

    def run(self):
        global run
        while run:
            #grabs host from queue
            host = self.queue.get()

            #grabs urls of hosts and then grabs chunk of webpage
            url = urllib2.urlopen(host)
            chunk = url.read()

            #place chunk into out queue
            self.out_queue.put(chunk)

            #signals to queue job is done
            self.queue.task_done()

        print 'ThreadUrl finished...'


class DatamineThread(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, out_queue):
        threading.Thread.__init__(self)
        self.out_queue = out_queue

    def run(self):
        global run
        while run:
            #grabs host from queue
            chunk = self.out_queue.get()

            #parse the chunk
            soup = BeautifulSoup(chunk)
            print soup.findAll(['title'])

            #signals to queue job is done
            self.out_queue.task_done()

        print 'DatamineThread finished...'

start = time.time()
def main():
    global run
    #spawn a pool of threads, and pass them queue instance
    for i in range(5):
        t = ThreadUrl(queue, out_queue)
        t.setDaemon(True)
        t.start()

    #populate queue with data
    for host in hosts:
        queue.put(host)

    for i in range(5):
        dt = DatamineThread(out_queue)
        dt.setDaemon(True)
        dt.start()


    #wait on the queue until everything has been processed
    queue.join()
    out_queue.join()

    # try and break while-loops in threads
    run = False

    time.sleep(5)


main()
print "Elapsed Time: %s" % (time.time() - start)

I'm not personally a huge fan of global variables for thread conditions, in large part because I've seen what you're encountering before. The reason comes from the python docs for Queue.get.

If optional args block is true and timeout is None (the default), block if necessary until an item is available.

Essentially, you're never seeing a second check against while run: because out_queue.get() has blocked indefinitely after the Queue empties out.

The better way to do this, IMHO, is to either use sentinel values in the Queues, or to use get_nowait and catch the exception to break the loop. Examples:

Sentinels

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            data = self.out_queue.get()
            if data == "time to quit": break
            # non-sentinel processing here.

Try / Except

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            try:
                data = self.out_queue.get_nowait() # also, out_queue.get(False)
            except Queue.Empty: break
            # data processing here.

To ensure that all of your threads end, can do it a couple ways:

Add Sentinels for each Worker

for i in range(numWorkers):
  out_queue.put('time to quit')

out_queue.join()

Replace the Sentinel

class DatamineThread(threading.Thread):
    def run(self):
        while True:
            data = self.out_queue.get()
            if data == "time to quit": 
                self.out_queue.put('time to quit')
                break
            # non-sentinel processing here.

Either way should work. Which is preferred depends on how the out_queue is populated. If it can be added to / removed from by worker threads the first approach is preferable. Call join() , then add the sentinels, then call join() again. The second approach is good if you don't want to remember how many worker threads you created - it only uses one sentinel value and doesn't clutter the Queue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM