[英]Get JSON using Python and AsyncIO
Not so long ago, I began to learn asyncio. 不久前,我开始学习asyncio。 And I ran into a problem.
我遇到了一个问题。 My code is not terminating.
我的代码没有终止。 I can't figure it out.
我无法弄清楚。 Help me please!
请帮帮我!
import signal
import sys
import asyncio
import aiohttp
import json
loop = asyncio.get_event_loop()
client = aiohttp.ClientSession(loop=loop)
async def get_json(client, url):
async with client.get(url) as response:
assert response.status == 200
return await response.read()
async def get_reddit_cont(subreddit, client):
data1 = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=50')
jn = json.loads(data1.decode('utf-8'))
print('DONE:', subreddit)
def signal_handler(signal, frame):
loop.stop()
client.close()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
for key in {'python':1, 'programming':2, 'compsci':3}:
asyncio.ensure_future(get_reddit_cont(key, client))
loop.run_forever()
Result: 结果:
DONE: compsci
DONE: programming
DONE: python
...
I tried to accomplish something, but the result was not stable. 我试图完成一些事情,但结果并不稳定。
future = []
for key in {'python':1, 'programming':2, 'compsci':3}:
future=asyncio.ensure_future(get_reddit_cont(key, client))
loop.run_until_complete(future)
Result (1 tasks instead of 3): 结果(1个任务而不是3个):
DONE: compsci
[Finished in 1.5s]
I solved my question in this way: 我用这种方式解决了我的问题:
Added by: 添加者:
async with aiohttp.ClientSession () as a client:
AT: 在:
async def get_reddit_cont (subreddit, client):
And: 和:
if __name__ == '__main__':
loop = asyncio.get_event_loop()
futures = [get_reddit_cont(subreddit,client) for subreddit in range(1,6)]
result = loop.run_until_complete(asyncio.gather(*futures))
But when the code is completed, I get the message: 但是当代码完成后,我收到消息:
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x034021F0>
[Finished in 1.0s]
I don't understand why this is happening. 我不明白为什么会这样。
But when I try to execute "for key" about 60 or more times, I get an error: 但是当我尝试执行“for key”大约60次或更多次时,我收到一个错误:
...
...
aiohttp.client_exceptions.ClientOSError: [WinError 10054] Remote host forcibly terminated an existing connectionaiohttp.client_exceptions.ClientOSError:[WinError 10054]远程主机强制终止现有连接
The answer lies in your code. 答案在于你的代码。 Here's the clue
loop.run_forever()
. 这是线索
loop.run_forever()
。 So you will need to call loop.stop()
. 所以你需要调用
loop.stop()
。 I would use a condition such as an if
clause or using a while
loop. 我会使用
if
子句或使用while
循环等条件。
if we_have_what_we_need:
signal_handler(signal, frame)
or 要么
while we_dont_have_what_we_need:
loop.forever()
The first will stop your code when the condition is met. 第一个将在满足条件时停止代码。 The latter will keep going until the condition is met.
后者将一直持续到条件满足为止。
[UPDATE] [UPDATE]
We can also use; 我们也可以用;
loop.run_until_complete(future)
Run until the future (an instance of Future) has completed.
运行直到将来(Future的一个实例)已经完成。
If the argument is a coroutine object it is implicitly scheduled to run as a asyncio.Task.
如果参数是协程对象,则它被隐式调度为asyncio.Task。
Return the Future's result or raise its exception.
返回Future的结果或提高其异常。
loop.run_forever()
Run the event loop until stop() is called.
运行事件循环,直到调用stop()。
Here are a few suggested changes, with context in the comments. 以下是一些建议的更改,以及评论中的上下文。
Unless you really have a unique use-case, or are just experimenting for learning's sake, there probably shouldn't be a reason to use signal
-- asyncio
has top-level functions that let you decide when to close and terminate the event loop. 除非你真的有一个独特的用例,或者只是为了学习而尝试,否则可能没有理由使用
signal
- asyncio
有顶级函数,可以让你决定何时关闭和终止事件循环。
import asyncio
import logging
import sys
import aiohttp
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,
format='%(asctime)s:%(message)s')
URL = 'https://www.reddit.com/r/{subreddit}/top.json?sort=top&t=day&limit=50'
async def get_json(client: aiohttp.ClientSession, url: str) -> dict:
# If you're going to be making repeated requests, use this
# over .get(), which is just a wrapper around `.request()` and
# involves an unneeded lookup
async with client.request('GET', url) as response:
# Raise if the response code is >= 400.
# Some 200 codes may still be "ok".
# You can also pass raise_for_status within
# client.request().
response.raise_for_status()
# Let your code be fully async. The call to json.loads()
# is blocking and won't take full advantage.
#
# And it does largely the same thing you're doing now:
# https://github.com/aio-libs/aiohttp/blob/76268e31630bb8615999ec40984706745f7f82d1/aiohttp/client_reqrep.py#L985
j = await response.json()
logging.info('DONE: got %s, size %s', url, j.__sizeof__())
return j
async def get_reddit_cont(keys, **kwargs) -> list:
async with aiohttp.ClientSession(**kwargs) as session:
# Use a single session as a context manager.
# this enables connection pooling, which matters a lot when
# you're only talking to one site
tasks = []
for key in keys:
# create_task: Python 3.7+
task = asyncio.create_task(
get_json(session, URL.format(subreddit=key)))
tasks.append(task)
# The result of this will be a list of dictionaries
# It will only return when all of your subreddits
# have given you a response & been decoded
#
# To process greedily, use asyncio.as_completed()
return await asyncio.gather(*tasks, return_exceptions=True)
if __name__ == '__main__':
default = ('python', 'programming', 'compsci')
keys = sys.argv[1:] if len(sys.argv) > 1 else default
sys.exit(asyncio.run(get_reddit_cont(keys=keys)))
Output: 输出:
$ python3 asyncreddit.py
2018-11-07 21:44:49,495:Using selector: KqueueSelector
2018-11-07 21:44:49,653:DONE: got https://www.reddit.com/r/compsci/top.json?sort=top&t=day&limit=50, size 216
2018-11-07 21:44:49,713:DONE: got https://www.reddit.com/r/python/top.json?sort=top&t=day&limit=50, size 216
2018-11-07 21:44:49,947:DONE: got https://www.reddit.com/r/programming/top.json?sort=top&t=day&limit=50, size 216
Edit: from your question: 编辑:从您的问题:
But when the code is completed, I get the message:
Unclosed client session
但是当代码完成后,我收到消息:
Unclosed client session
This is because you need to .close()
the client
object, just as you would a file object. 这是因为你需要
.close()
client
对象,就像你对文件对象一样。 You can do that two ways: 你可以这两种方式:
client.close()
. client.close()
。 It is safer to wrap this in a try
/ finally
block to make sure that it's closed no matter what try
/ finally
块中更安全,以确保无论如何都将其关闭 async with
block is over, the session is automatically closed via its .__aexit__()
method. async with
block结束后,会话将通过其.__aexit__()
方法自动关闭。 The connector
is the underlying TCPConnector
, which is an attribute of the session. connector
是底层TCPConnector
,它是会话的一个属性。 It handles the connection pooling, and it's what ultimately is left open in your code. 它处理连接池,它最终在代码中保持打开状态。
I solved the problem in this way: 我用这种方式解决了这个问题:
import asyncio
import aiohttp
import json
async def get_json(client, url):
async with client.get(url) as response:
assert response.status == 200
return await response.read()
async def get_reddit_cont(subreddit):
async with aiohttp.ClientSession(loop=loop) as client:
data1 = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=50')
jn = json.loads(data1.decode('utf-8'))
print('DONE:', subreddit)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
futures = [get_reddit_cont(subreddit) for subreddit in {'python':1, 'programming':2, 'compsci':3}]
result = loop.run_until_complete(asyncio.gather(*futures))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.