简体   繁体   English

使用Python和AsyncIO获取JSON

[英]Get JSON using Python and AsyncIO

Not so long ago, I began to learn asyncio. 不久前,我开始学习asyncio。 And I ran into a problem. 我遇到了一个问题。 My code is not terminating. 我的代码没有终止。 I can't figure it out. 我无法弄清楚。 Help me please! 请帮帮我!

import signal
import sys
import asyncio
import aiohttp
import json

loop = asyncio.get_event_loop()
client = aiohttp.ClientSession(loop=loop)

async def get_json(client, url):
    async with client.get(url) as response:
        assert response.status == 200
        return await response.read()

async def get_reddit_cont(subreddit, client):
    data1 = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=50')

    jn = json.loads(data1.decode('utf-8'))

    print('DONE:', subreddit)

def signal_handler(signal, frame):
    loop.stop()
    client.close()
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

for key in {'python':1, 'programming':2, 'compsci':3}:
    asyncio.ensure_future(get_reddit_cont(key, client))
loop.run_forever()

Result: 结果:

DONE: compsci  
DONE: programming  
DONE: python  
...

I tried to accomplish something, but the result was not stable. 我试图完成一些事情,但结果并不稳定。

future = []
for key in {'python':1, 'programming':2, 'compsci':3}:
    future=asyncio.ensure_future(get_reddit_cont(key, client))
loop.run_until_complete(future)

Result (1 tasks instead of 3): 结果(1个任务而不是3个):

DONE: compsci  
[Finished in 1.5s]  

I solved my question in this way: 我用这种方式解决了我的问题:

Added by: 添加者:

async with aiohttp.ClientSession () as a client:

AT: 在:

async def get_reddit_cont (subreddit, client):  

And: 和:

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    futures = [get_reddit_cont(subreddit,client) for subreddit in range(1,6)]
    result = loop.run_until_complete(asyncio.gather(*futures))

But when the code is completed, I get the message: 但是当代码完成后,我收到消息:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x034021F0>
[Finished in 1.0s]

I don't understand why this is happening. 我不明白为什么会这样。

But when I try to execute "for key" about 60 or more times, I get an error: 但是当我尝试执行“for key”大约60次或更多次时,我收到一个错误:

... ...
aiohttp.client_exceptions.ClientOSError: [WinError 10054] Remote host forcibly terminated an existing connection aiohttp.client_exceptions.ClientOSError:[WinError 10054]远程主机强制终止现有连接

The answer lies in your code. 答案在于你的代码。 Here's the clue loop.run_forever() . 这是线索loop.run_forever() So you will need to call loop.stop() . 所以你需要调用loop.stop() I would use a condition such as an if clause or using a while loop. 我会使用if子句或使用while循环等条件。

if we_have_what_we_need:
    signal_handler(signal, frame)

or 要么

while we_dont_have_what_we_need:
    loop.forever()

The first will stop your code when the condition is met. 第一个将在满足条件时停止代码。 The latter will keep going until the condition is met. 后者将一直持续到条件满足为止。

[UPDATE] [UPDATE]

We can also use; 我们也可以用;

(Python Docs) (Python文档)

loop.run_until_complete(future)

Run until the future (an instance of Future) has completed. 运行直到将来(Future的一个实例)已经完成。

If the argument is a coroutine object it is implicitly scheduled to run as a asyncio.Task. 如果参数是协程对象,则它被隐式调度为asyncio.Task。

Return the Future's result or raise its exception. 返回Future的结果或提高其异常。

loop.run_forever()

Run the event loop until stop() is called. 运行事件循环,直到调用stop()。

Here are a few suggested changes, with context in the comments. 以下是一些建议的更改,以及评论中的上下文。

Unless you really have a unique use-case, or are just experimenting for learning's sake, there probably shouldn't be a reason to use signal -- asyncio has top-level functions that let you decide when to close and terminate the event loop. 除非你真的有一个独特的用例,或者只是为了学习而尝试,否则可能没有理由使用signal - asyncio有顶级函数,可以让你决定何时关闭和终止事件循环。

import asyncio
import logging
import sys

import aiohttp

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,
                    format='%(asctime)s:%(message)s')

URL = 'https://www.reddit.com/r/{subreddit}/top.json?sort=top&t=day&limit=50'


async def get_json(client: aiohttp.ClientSession, url: str) -> dict:
    # If you're going to be making repeated requests, use this
    # over .get(), which is just a wrapper around `.request()` and
    # involves an unneeded lookup
    async with client.request('GET', url) as response:

        # Raise if the response code is >= 400.
        # Some 200 codes may still be "ok".
        # You can also pass raise_for_status within
        # client.request().
        response.raise_for_status()

        # Let your code be fully async.  The call to json.loads()
        # is blocking and won't take full advantage.
        #
        # And it does largely the same thing you're doing now:
        # https://github.com/aio-libs/aiohttp/blob/76268e31630bb8615999ec40984706745f7f82d1/aiohttp/client_reqrep.py#L985
        j = await response.json()
        logging.info('DONE: got %s, size %s', url, j.__sizeof__())
        return j


async def get_reddit_cont(keys, **kwargs) -> list:
    async with aiohttp.ClientSession(**kwargs) as session:
        # Use a single session as a context manager.
        # this enables connection pooling, which matters a lot when
        # you're only talking to one site
        tasks = []
        for key in keys:
            # create_task: Python 3.7+
            task = asyncio.create_task(
                get_json(session, URL.format(subreddit=key)))
            tasks.append(task)
        # The result of this will be a list of dictionaries
        # It will only return when all of your subreddits
        # have given you a response & been decoded
        #
        # To process greedily, use asyncio.as_completed()
        return await asyncio.gather(*tasks, return_exceptions=True)


if __name__ == '__main__':
    default = ('python', 'programming', 'compsci')
    keys = sys.argv[1:] if len(sys.argv) > 1 else default
    sys.exit(asyncio.run(get_reddit_cont(keys=keys)))

Output: 输出:

$ python3 asyncreddit.py 
2018-11-07 21:44:49,495:Using selector: KqueueSelector
2018-11-07 21:44:49,653:DONE: got https://www.reddit.com/r/compsci/top.json?sort=top&t=day&limit=50, size 216
2018-11-07 21:44:49,713:DONE: got https://www.reddit.com/r/python/top.json?sort=top&t=day&limit=50, size 216
2018-11-07 21:44:49,947:DONE: got https://www.reddit.com/r/programming/top.json?sort=top&t=day&limit=50, size 216

Edit: from your question: 编辑:从您的问题:

But when the code is completed, I get the message: Unclosed client session 但是当代码完成后,我收到消息: Unclosed client session

This is because you need to .close() the client object, just as you would a file object. 这是因为你需要.close() client对象,就像你对文件对象一样。 You can do that two ways: 你可以这两种方式:

  • Call it explicitly: client.close() . 明确地调用它: client.close() It is safer to wrap this in a try / finally block to make sure that it's closed no matter what 将它包装在try / finally块中更安全,以确保无论如何都将其关闭
  • Or (easier way), use the client as an async context manager, as in this answer. 或者(更简单的方法),将客户端用作异步上下文管理器,如本答案中所述。 This means that, after the async with block is over, the session is automatically closed via its .__aexit__() method. 这意味着,在async with block结束后,会话将通过其.__aexit__()方法自动关闭。

The connector is the underlying TCPConnector , which is an attribute of the session. connector是底层TCPConnector ,它是会话的一个属性。 It handles the connection pooling, and it's what ultimately is left open in your code. 它处理连接池,它最终在代码中保持打开状态。

I solved the problem in this way: 我用这种方式解决了这个问题:

import asyncio
import aiohttp
import json

async def get_json(client, url):
    async with client.get(url) as response:
        assert response.status == 200
        return await response.read()

async def get_reddit_cont(subreddit):
    async with aiohttp.ClientSession(loop=loop) as client:
        data1 = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=50')

        jn = json.loads(data1.decode('utf-8'))

        print('DONE:', subreddit)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    futures = [get_reddit_cont(subreddit) for subreddit in {'python':1, 'programming':2, 'compsci':3}]
    result = loop.run_until_complete(asyncio.gather(*futures))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM