简体   繁体   English

为什么我的消费者在队列中与我的生产者分开工作?

[英]Why is my consumer working separately from my producer in the queue?

My objective is to call an API asynchronously, and write the results (from each call) into a file (1 call -> 1 file).我的目标是异步调用 API,并将结果(来自每次调用)写入文件(1 个调用 -> 1 个文件)。 I thought one way to implement this is using a queue.我认为实现这一点的一种方法是使用队列。 My intention is to have responses pushed into the queue as soon as they are ready by the producers, and then have consumers processing (writing) files as soon as they are available.我的目的是在生产者准备好响应后立即将其推送到队列中,然后让消费者在文件可用时立即处理(写入)文件。

Confusion: Looking at the print statements when I run my code, I see that first the producers are done AND THEN the consumers start consuming my output.困惑:在我运行代码时查看打印语句,我看到首先生产者完成,然后消费者开始使用我的 output。 This does not seem to go with my intention of Consumers working on tasks as soon as they are made available.在 go 看来,这似乎不是我的意图,即消费者在任务可用后立即处理它们。 I have also considered using multiple processes (1 for consumers, 1 for producers), but I am not sure if I am complicating things this way.我也考虑过使用多个进程(1 个用于消费者,1 个用于生产者),但我不确定我是否以这种方式使事情复杂化。

I have created an illustration of the current status:我创建了当前状态的插图:

import aiohttp
import asyncio


async def get_data(session, day):
    async with session.post(url=SOME_URL, json=SOME_FORMAT, headers=HEADERS) as response:
        return await response.text()


async def producer(q, day):
    async with aiohttp.ClientSession() as session:
        result = await get_data(session, day)
        await q.put(result)


async def consumer(q):
    while True:
        outcome = await q.get()
        print("Consumed:", outcome) # assuming I write files here
        q.task_done()


async def main():
    queue = asyncio.Queue()
    days = [day for day in range(20)]  # Here I normally use calendar dates instead of range
    producers = [asyncio.create_task(producer(queue, day) for day in days]
    consumer = asyncio.create_task(consumer(queue)
    await asyncio.gather(*producers)
    await queue.join()
    consumer.cancel()

    if __name__ == '__main__':
        asyncio.run(main())

Am I on the right track?我在正确的轨道上吗?

Your code is generally fine (except for a couple of syntax errors, which I guess are the result of bad copy-paste).您的代码通常很好(除了几个语法错误,我猜这是错误的复制粘贴的结果)。 All the producers are indeed created before the consumer starts working because they have nothing to wait for.所有的生产者确实是在消费者开始工作之前创建的,因为他们没有什么可等待的。 But, if there's real work that the producers need to do you'll see that they complete the work only after the consumer starts working, and then things work file.但是,如果生产者需要做真正的工作,你会看到他们只有在消费者开始工作后才完成工作,然后事情才开始工作。

Here's an edited version of your code, plus output that demonstrates that things are indeed working.这是您的代码的编辑版本,加上 output 证明事情确实有效。

import aiohttp
import asyncio

async def get_data(session, day):
    print(f"get data, day {day}")
    async with session.get(url="https://www.google.com") as response:
        res = await response.text()
    print(f"got data, day {day}")
    return res[:100]

async def producer(q, day):
    async with aiohttp.ClientSession() as session:
        result = await get_data(session, day)
        await q.put(result)

async def consumer(q):
    print("Consumer stated")
    while True:
        outcome = await q.get()
        print("Consumed:", outcome) # assuming I write files here
        asyncio.sleep(1)
        q.task_done()

async def main():
    queue = asyncio.Queue()
    days = [day for day in range(20)]  # Here I normally use calendar dates instead of range
    producers = [asyncio.create_task(producer(queue, day)) for day in days]
    print("main: producer tasks created")
    consumer_task = asyncio.create_task(consumer(queue))
    print("main: consumer task created")
    await asyncio.gather(*producers)
    print("main: gathered producers")
    await queue.join()
    consumer_task.cancel()

if __name__ == '__main__':
    asyncio.run(main())

output: output:

main: producer tasks created
main: consumer task created
get data, day 0
get data, day 1
get data, day 2
get data, day 3
...
get data, day 19
Consumer stated
got data, day 1
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
queue_so.py:21: RuntimeWarning: coroutine 'sleep' was never awaited
  asyncio.sleep(1)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
got data, day 10
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 19
got data, day 11
got data, day 14
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 15
got data, day 17
got data, day 6
got data, day 18
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 7
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 8
got data, day 9
got data, day 2
got data, day 12
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 0
got data, day 5
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 4
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 3
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 13
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
got data, day 16
Consumed: <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content
main: gathered producers

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM