简体   繁体   English

使用队列的Python多线程

[英]Python multithreading using queue

I was reading a article on Python multi threading using Queues and have a basic question. 我正在阅读有关使用Queues进行Python多线程的文章,但有一个基本问题。

Based on the print stmt, 5 threads are started as expected. 基于print stmt,按预期启动了5个线程。 So, how does the queue works? 那么,队列如何工作?

1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item? 1.线程最初启动,并且在队列中填充了某个项目时,它是否会重新启动并开始处理该项目? 2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 2.如果我们使用队列系统,并且线程在队列中逐项处理每个项目,那么性能如何提高。这是否与串行处理不同,即; 1 by 1. 一对一

import Queue
import threading
import urllib2
import datetime
import time

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):

  def __init__(self, queue):
    threading.Thread.__init__(self)
    print 'threads are created'
    self.queue = queue

  def run(self):
    while True:
      #grabs host from queue
      print 'thread startting to run'
      now = datetime.datetime.now()

      host = self.queue.get()

      #grabs urls of hosts and prints first 1024 bytes of page
      url = urllib2.urlopen(host)
      print 'host=%s ,threadname=%s' % (host,self.getName())
      print url.read(20)

      #signals to queue job is done
      self.queue.task_done()

start = time.time()
if __name__ == '__main__':

  #spawn a pool of threads, and pass them queue instance 
    print 'program start'
    for i in range(5):

        t = ThreadUrl(queue)
        t.setDaemon(True)
        t.start()

 #populate queue with data   
    for host in hosts:
        queue.put(host)

 #wait on the queue until everything has been processed     
    queue.join()


    print "Elapsed Time: %s" % (time.time() - start)

A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data. 队列与列表容器类似,但是具有内部锁定,使其成为线程安全的数据通信方式。

What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. 启动所有线程时发生的情况是,它们全部都阻塞了self.queue.get()调用,等待从队列中拉出一个项目。 When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. 当某个项目从您的主线程放入队列时,其中一个线程将变为未阻塞状态并接收该项目。 It can then continue to process it until it finishes and returns to a blocking state. 然后,它可以继续处理它,直到完成并返回到阻塞状态。

All of your threads can run concurrently because they all are able to receive items from the queue. 您的所有线程可以并发运行,因为它们都能够从队列中接收项目。 This is where you would see your improvement in performance. 您将在这里看到性能的提高。 If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. 如果urlopenread在一个线程中花费时间并且正在IO上等待,则意味着另一个线程可以工作。 The queue objects job is simply to manage the locking access, and popping off items to the callers. 队列对象的工作仅仅是管理锁定访问,并弹出呼叫者项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM