简体   繁体   中英

Why is infinite loop needed when using threading and a queue in Python

I'm trying to understand how to use threading and I came across this nice example at http://www.ibm.com/developerworks/aix/library/au-threadingpython/

      #!/usr/bin/env python
      import Queue
      import threading
      import urllib2
      import time

      hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
      "http://ibm.com", "http://apple.com"]

      queue = Queue.Queue()

      class ThreadUrl(threading.Thread):
      """Threaded Url Grab"""
        def __init__(self, queue):
          threading.Thread.__init__(self)
          self.queue = queue

        def run(self):
          while True:
            #grabs host from queue
            host = self.queue.get()

            #grabs urls of hosts and prints first 1024 bytes of page
            url = urllib2.urlopen(host)
            print url.read(1024)

            #signals to queue job is done
            self.queue.task_done()

      start = time.time()
      def main():

        #spawn a pool of threads, and pass them queue instance 
        for i in range(5):
          t = ThreadUrl(queue)
          t.setDaemon(True)
          t.start()

       #populate queue with data   
          for host in hosts:
            queue.put(host)

       #wait on the queue until everything has been processed     
       queue.join()

      main()
      print "Elapsed Time: %s" % (time.time() - start)

The part I don't understand is why the run method has an infinite loop:

        def run(self):
          while True:
            ... etc ...

Just for laughs I ran the program without the loop and it looks like it runs fine! So can someone explain why this loop is needed? Also how is the loop exited as there is no break statement?

Do you want the thread to perform more than one job? If not, you don't need the loop. If so, you need something that's going to make it do that. A loop is a common solution. Your sample data contains five job, and the program starts five threads. So you don't need any thread to do more than one job here. Try adding one more URL to your workload, though, and see what changes.

The loop is required as without it each worker thread terminates as soon as it completes its first task. What you want is to have the worker take another task when it finishes.

In the code above, you create 5 worker threads, which just happens to be sufficient to cover the 5 URL's you are working with. If you had >5 URL's you would find only the first 5 were processed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM