简体   繁体   English

如何为两个不同的队列创建线程以并行运行? -Python

[英]how can I create threads for two different queues to run in parallel? - Python

I have two queues for different tasks, the first crawl will start to crawl the links in the list and then it will generate more links to crawl to the queue one and also will generate new links to a different task on queue two, my program is working but the problem is: When the workers for the queue two start running it stops the workers from queue one, they are basically not running in parallel they are waiting each other finish their tasks. 我有两个用于执行不同任务的队列,第一个爬网将开始对列表中的链接进行爬网,然后它将生成更多的链接以爬网到一个队列,并且还将在第二个队列中生成指向另一个任务的新链接,我的程序是工作正常,但问题是:当队列2的工作程序开始运行时,它使队列1的工作程序停止了,他们基本上不是并行运行的,他们正在等待对方完成任务。 How can I make them run in parallel? 如何使它们并行运行?

import threading
from queue import Queue

queue = Queue()
queue_two = Queue()

links = ['www.example.com', 'www.example.com', 'www.example.com',
         'www.example.com', 'www.example.com', 'www.example.com', 
         'www.example.com', 'www.example.com', 'www.example.com']

new_links = []

def create_workers():
    for _ in range(4):
        t = threading.Thread(target=work)
        t.daemon = True
        t.start()

    for _ in range(2):
        t = threading.Thread(target=work_two)
        t.daemon = True
        t.start()

def work():
    while True:
        work = queue.get()
        #do something
        queue.task_done()

def work_two():
    while True:
        work = queue_two.get()
        #do something
        queue_two.task_done()

def create_jobs():
    for link in links:
        queue.put(link)
    queue.join()
    crawl_two()
    crawl()

def create_jobs_two():
    for link in new_links:
        queue_two.put(link)
    queue_two.join()
    crawl_two()

def crawl():
    queued_links = links
    if len(queued_links) > 0:
        create_jobs()

def crawl_two():
    queued_links = new_links
    if len(queued_links) > 0:
        create_jobs_two()

create_workers()
crawl()

That is because your processing does not seem to be parallel between work and work two. 那是因为您的处理似乎在工作与工作两个之间并不并行。

This is what happens: 这是发生了什么:

  1. You create workers for work and work_two 您创建工作工人和work_two
  2. Crawl is called 抓取称为
  3. Create_jobs is called - "work" workers start processing them Create_jobs被称为-“工作”工人开始处理它们
  4. Create_jobs waits in queue.join() until all of them have completed Create_jobs在queue.join()等待,直到全部完成
  5. Crawl_two is called Crawl_two被称为
  6. Create_jobs_two is called - "work_two" workers start processing them Create_jobs_two被称为-“ work_two”工人开始处理它们
  7. Create_jobs_two waits in queue_two.join() until all of them have completed Create_jobs_two在queue_two.join()等待,直到它们全部完成
  8. Crawl is called (start from 2. again) 称为抓取(再次从2开始)。

Basically you never enter in a situation where work and work_two would run in parallel, as you use queue.join() to wait until all of the currently running tasks have finished. 基本上,您永远不会进入work和work_two会并行运行的情况,因为您使用queue.join()等待所有当前正在运行的任务完成。 Only then will you assign tasks to the "other" queue. 只有这样,您才能将任务分配给“其他”队列。 Your work and work_two do run in parallel within themselves, but the control structure ensures work and work_two are mutually exclusive. 您的work和work_two确实在它们内部并行运行,但是控制结构确保work和work_two是互斥的。 You need to redesign the loops and queues if you want both of them run in parallel. 如果希望它们都并行运行,则需要重新设计循环和队列。

You will probably also want to investigate the use of threading.Lock() to protect your global new_links variable, as I assume you will be appending things to it in your worker threads. 您可能还需要研究使用threading.Lock()来保护全局new_links变量,因为我假设您将在工作线程中添加一些内容。 This is absolutely fine but you need a lock to ensure two threads are not trying to do this simultaneously. 绝对可以,但是您需要一个锁以确保两个线程不会同时尝试执行此操作。 But this is not related to your current problem. 但这与您当前的问题无关。 This only helps you avoid the next problem. 这只会帮助您避免下一个问题。

I of course do not know what you try to achieve here, but you might try solving your problem and avoiding the next one by scrapping the global new_links completely. 我当然不知道您要在这里实现什么,但是您可以尝试通过完全废弃全局new_links来解决问题并避免出现下一个问题。 What if your work and work_two just fed the queue of the other worker if they needed to submit tasks to them, instead of putting items into a global variable and then feeding them to the queue in the main thread? 如果您的work和work_two只是将另一个工作人员的队列提交给他们,而他们需要向他们提交任务,而不是将项目放入一个全局变量,然后再将它们提供给主线程的队列,该怎么办?

Or you could build an "orchestration" thread that would queue tasks to workers, process responses from them and then act on that response accordingly, either queuing it back to one of the queues or accepting the result if it is "ready". 或者,您可以构建一个“编排”线程,该线程将任务排队给工作人员,处理来自他们的响应,然后相应地对该响应进行操作,将其排队回到一个队列中,或者如果结果为“就绪”,则接受结果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何并行运行具有不同参数的相同函数的N个线程? - 蟒蛇 - How to run N threads of the same functions with different parameters in parallel? - python (多线程-Python)如何创建一个运行两个脚本的脚本,我通常从两个不同的终端运行这些脚本? - (Multithreading-Python) How I can create a script that run two scripts which I usually run from two different terminals? python 线程并行运行? - python threads that run in parallel? 如何在一个python程序下的两个端口的两个线程/进程中运行两个flask服务器? - How can I run two flask servers in two threads/processes on two ports under one python program? 如何使 Python 中的异步线程并行运行? - How to make asyncio threads in Python run in parallel? 如何将Python线程放在多个队列中并按照添加的顺序运行它? - How to put Python threads in multiple queues and run it in the order it was added? 如何同时在 Python 中运行两个不同的代码? - How can I run two different code in Python at the same time? 如何在 linux 中与不同的参数并行运行 python 脚本? - How do I run a python script in parallel with different arguments in linux? 我不能在 python 中启动两个并行线程,每个线程都有一个 forerver 循环 - Can't i start two parallel threads in python, each thread has a forerver loop 如何在pyqt中同时运行两个不同的线程 - How run two different threads simultaneously in pyqt
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM