使用python循环进行多线程/多处理

Question

我有一个脚本，该脚本循环访问一系列URL，以根据返回的json数据提取项目位置。 但是，该脚本需要60分钟才能运行，而其中的55分钟（每个cprofile）要花在等待json数据加载上。

我想通过多线程一次运行多个POST请求来加快速度，并最初将URL范围分为两半来完成。 我陷入困境的地方是如何实现多线程或异步。

精简代码：

import asyncio
import aiohttp

# i am not recommend to use globals
results = dict()
url = "https://www.website.com/store/ajax/search"
query = "store={}&size=18&query=17360031"

# this is default url opener got from aiohttp documentation
async def open_url(store, loop=None):
    async with aiohttp.ClientSession(loop=loop) as session:
        async with session.post(url, data={'searchQuery': query.format(store)}) as resp:
            return await resp.json(), store

async def processing(loop=None):
    # U need to use 'global' keyworld if U wan't to write to global variables
    global results
    # one of the simplest ways to parallelize requests, is to init Future, and when data will be ready save it to global
    tasks = [open_url(store, loop=event_loop) for store in range(0, 5)]
    for coro in asyncio.as_completed(tasks, loop=loop):
        try:
            data, store = await coro
            results[store] = data['searchResults']['results'][0]['location']['aisle']
        except (IndexError, KeyError):
            continue


if __name__ == '__main__':
    event_loop = asyncio.new_event_loop()
    event_loop.run_until_complete(processing(loop=event_loop))

# Print Results
for store, data in results.items():
    print(store, data)

json：

    {u'count': 1,
     u'results': [{u'department': {u'name': u'Home', u'storeDeptId': -1},
           u'location': {u'aisle': [A], u'detailed': [A.536]},
           u'score': u'0.507073'}],
     u'totalCount': 1}

Answer 1

即使您使用多线程或多处理，每个线程/进程仍将阻塞，直到检索到JSON数据为止。 这可能会加快速度，但这仍然不是您的最佳选择。

由于您使用的是请求，请尝试将grequests与gevent结合使用。 这使您可以定义一系列异步运行的HTTP请求。 结果，您将获得巨大的速度提升。 用法非常简单：只需创建一个请求列表（使用grequests.get ）并将其传递给grequests.map 。

希望这可以帮助！

Answer 2

如果您不想并行化请求（我希望您要求这样做）。 此代码段将有所帮助。 有请求打开器，以及通过aiohttp和asyncio发送的2000个发布请求。 使用python3.5

import asyncio
import aiohttp

# i am not recommend to use globals
results = dict()
MAX_RETRIES = 5
MATCH_SLEEP_TIME = 3  # i am recommend U to move this variables to other file like constants.py or any else
url = "https://www.website.com/store/ajax/search"
query = "store={}&size=18&query=44159"

# this is default url opener got from aiohttp documentation
async def open_url(store, semaphore, loop=None):
    for _ in range(MAX_RETRIES):
        with await semarhore:
            try:
                async with aiohttp.ClientSession(loop=loop) as session:
                    async with session.post(url, data={'searchQuery': query.format(store)}) as resp:
                        return await resp.json(), store
            except ConnectionResetError:
                # u can handle more exceptions here, and sleep if they are raised
                await asyncio.sleep(MATCH_SLEEP_TIME, loop=loop)
                continue
    return None

async def processing(semaphore, loop=None):
    # U need to use 'global' keyworld if U wan't to write to global     variables
    global results
    # one of the simplest ways to parallelize requests, is to init     Future, and when data will be ready save it to global
    tasks = [open_url(store, semaphore, loop=event_loop) for store in range(0,     2000)]
    for coro in asyncio.as_completed(tasks, loop=loop):
        try:
            response = await coro
            if response is None:
                continue
            data, store = response
            results[store] = data['searchResults']['results'][0]['location']['aisle']
        except (IndexError, KeyError):
            continue


if __name__ == '__main__':
    event_loop = asyncio.new_event_loop()
    semaphore = asyncio.Semaphore(50, loop=event_loop)  # count of concurrent requests
    event_loop.run_until_complete(processing(semaphore, loop=event_loop))

使用python循环进行多线程/多处理

问题描述

2 个解决方案

解决方案1
0 2016-12-23 22:29:07

解决方案2
0 2016-12-23 22:50:37

使用python循环进行多线程/多处理

问题描述

2 个解决方案

解决方案1 0 2016-12-23 22:29:07

解决方案2 0 2016-12-23 22:50:37

解决方案1
0 2016-12-23 22:29:07

解决方案2
0 2016-12-23 22:50:37