简体   繁体   English

使用队列在python中进行线程化

[英]Threading in python using queue

I wanted to use threading in python to download lot of webpages and went through the following code which uses queues in one of the website. 我想在python中使用线程来下载很多网页,并通过以下代码在网站之一中使用队列。

it puts a infinite while loop. 它放了一个无限的循环。 Does each of thread run continuously with out ending till all of them are complete? 每个线程是否连续运行,结束直到所有线程完成? Am I missing something. 我错过了什么。

#!/usr/bin/env python
import Queue
import threading
import urllib2
import time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
  """Threaded Url Grab"""
  def __init__(self, queue):
    threading.Thread.__init__(self)
    self.queue = queue

  def run(self):
    while True:
      #grabs host from queue
      host = self.queue.get()

      #grabs urls of hosts and prints first 1024 bytes of page
      url = urllib2.urlopen(host)
      print url.read(1024)

      #signals to queue job is done
      self.queue.task_done()

start = time.time()
def main():

  #spawn a pool of threads, and pass them queue instance 
  for i in range(5):
    t = ThreadUrl(queue)
    t.setDaemon(True)
    t.start()

  #populate queue with data   
  for host in hosts:
    queue.put(host)

  #wait on the queue until everything has been processed     
  queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Setting the thread's to be daemon threads causes them to exit when the main is done. 将线程设置为daemon线程会导致它们在main完成时退出。 But, yes you are correct in that your threads will run continuously for as long as there is something in the queue else it will block. 但是,是的,你是正确的,因为你的线程会持续运行,只要queue东西它会阻塞。

The documentation explains this detail Queue docs 该文档解释了这个详细的队列文档

The python Threading documentation explains the daemon part as well. python Threading文档也解释了daemon部分。

The entire Python program exits when no alive non-daemon threads are left. 当没有剩下活着的非守护程序线程时,整个Python程序退出。

So, when the queue is emptied and the queue.join resumes when the interpreter exits the threads will then die. 因此,当队列被清空并且当解释器退出线程时, queue.join将恢复,然后将死亡。

EDIT: Correction on default behavior for Queue 编辑:更正Queue默认行为

Your script works fine for me, so I assume you are asking what is going on so you can understand it better. 你的脚本对我来说很好,所以我假设你在询问发生了什么,这样你就能更好地理解它。 Yes, your subclass puts each thread in an infinite loop, waiting on something to be put in the queue. 是的,你的子类将每个线程放在一个无限循环中,等待放入队列中的东西。 When something is found, it grabs it and does its thing. 当发现某些东西时,它会抓住它并做它的事情。 Then, the critical part, it notifies the queue that it's done with queue.task_done, and resumes waiting for another item in the queue. 然后,关键部分,它通过queue.task_done通知队列它已完成,并继续等待队列中的另一个项目。

While all this is going on with the worker threads, the main thread is waiting (join) until all the tasks in the queue are done, which will be when the threads have sent the queue.task_done flag the same number of times as messages in the queue . 虽然所有这一切都在工作线程上进行,但主线程正在等待(加入),直到队列中的所有任务完成为止,这将是线程发送queue.task_done标志的次数与消息中的相同次数。队列 。 At that point the main thread finishes and exits. 此时主线程完成并退出。 Since these are deamon threads, they close down too. 由于这些是deamon线程,它们也会关闭。

This is cool stuff, threads and queues. 这是很酷的东西,线程和队列。 It's one of the really good parts of Python. 这是Python真正优秀的部分之一。 You will hear all kinds of stuff about how threading in Python is screwed up with the GIL and such. 你会听到各种关于Python中的线程如何与GIL搞砸的东西。 But if you know where to use them (like in this case with network I/O), they will really speed things up for you. 但是如果你知道在哪里使用它们(比如在这种情况下使用网络I / O),它们将真正为你加速。 The general rule is if you are I/O bound, try and test threads; 一般规则是,如果您是I / O绑定,请尝试并测试线程; if you are cpu bound, threads are probably not a good idea, maybe try processes instead. 如果你是cpu绑定,线程可能不是一个好主意,也许尝试进程。

good luck, 祝好运,

Mike 麦克风

I don't think Queue is necessary in this case. 在这种情况下,我不认为Queue是必要的。 Using only Thread : 仅使用Thread

import threading, urllib2, time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

class ThreadUrl(threading.Thread):
    """Threaded Url Grab"""
    def __init__(self, host):
        threading.Thread.__init__(self)
        self.host = host

    def run(self):
        #grabs urls of hosts and prints first 1024 bytes of page
        url = urllib2.urlopen(self.host)
        print url.read(1024)

start = time.time()
def main():
    #spawn a pool of threads
    for i in range(len(hosts)):
        t = ThreadUrl(hosts[i])
        t.start()

main()
print "Elapsed Time: %s" % (time.time() - start)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM