简体   繁体   English

异步版本运行速度比非异步版本慢

[英]async version runs slower than the non async version

My program does the following:我的程序执行以下操作:

  1. Takes folder of .txt files获取 .txt 文件的文件夹
  2. For each file:对于每个文件:

    2.1. 2.1. read the file读取文件

    2.2 sort the contents as a list and pushes the list to a master list 2.2 将内容排序为列表并将列表推送到主列表

I did this without any async/await and these are the time statistics我在没有任何 async/await 的情况下这样做了,这些是时间统计信息

real    0m0.036s

user    0m0.018s

sys     0m0.009s

With the below async/await code I get使用下面的异步/等待代码我得到

real    0m0.144s

user    0m0.116s

sys     0m0.029s

which given the use case suggests that I am using aysncio incorrectly.考虑到用例表明我错误地使用了 aysncio。

Anybody have an idea what I am doing wrong?有人知道我做错了什么吗?

import asyncio
import aiofiles
import os

directory = "/tmp"
listOfLists = list()

async def sortingFiles(numbersInList):
    numbersInList.sort()

async def awaitProcessFiles(filename,numbersInList):
    await readFromFile(filename,numbersInList)
    await sortingFiles(numbersInList)
    await appendToList(numbersInList)


async def readFromFile(filename,numbersInList):
    async with aiofiles.open(directory+"/"+filename, 'r') as fin:
        async for line in fin:
            return numbersInList.append(int(line.strip("\n"),10))            
    fin.close()    

async def appendToList(numbersInList):
    listOfLists.append(numbersInList)

async def main():
    tasks=[]
    for filename in os.listdir(directory):
        if filename.endswith(".txt"):  
            numbersInList =list()
            task=asyncio.ensure_future(awaitProcessFiles(filename,numbersInList))
            tasks.append(task)
    await asyncio.gather(*tasks)   

if __name__== "__main__":
    asyncio.run(main())

Profiling info:剖析信息:

        151822 function calls (151048 primitive calls) in 0.239 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       11    0.050    0.005    0.050    0.005 {built-in method _imp.create_dynamic}
       57    0.022    0.000    0.022    0.000 {method 'read' of '_io.BufferedReader' objects}
       57    0.018    0.000    0.018    0.000 {built-in method io.open_code}
      267    0.012    0.000    0.012    0.000 {method 'control' of 'select.kqueue' objects}
       57    0.009    0.000    0.009    0.000 {built-in method marshal.loads}
      273    0.009    0.000    0.009    0.000 {method 'recv' of '_socket.socket' objects}
      265    0.005    0.000    0.098    0.000 base_events.py:1780(_run_once)
      313    0.004    0.000    0.004    0.000 {built-in method posix.stat}
      122    0.004    0.000    0.004    0.000 {method 'acquire' of '_thread.lock' objects}
  203/202    0.003    0.000    0.011    0.000 {built-in method builtins.__build_class__}
     1030    0.003    0.000    0.015    0.000 thread.py:158(submit)
     1030    0.003    0.000    0.009    0.000 futures.py:338(_chain_future)
     7473    0.003    0.000    0.003    0.000 {built-in method builtins.hasattr}
     1030    0.002    0.000    0.017    0.000 futures.py:318(_copy_future_state)
       36    0.002    0.000    0.002    0.000 {built-in method posix.getcwd}
     3218    0.002    0.000    0.077    0.000 {method 'run' of 'Context' objects}
     6196    0.002    0.000    0.003    0.000 threading.py:246(__enter__)
     3218    0.002    0.000    0.078    0.000 events.py:79(_run)
     6192    0.002    0.000    0.004    0.000 base_futures.py:13(isfuture)
     1047    0.002    0.000    0.002    0.000 threading.py:222(__init__)

Make some test files...制作一些测试文件...

import random, os
path = <directory name here>
nlines = range(1000)
nfiles = range(1,101)
for n in nfiles:
    fname = f'{n}.txt'
    with open(os.path.join(path,fname),'w') as f:
        for _ in nlines:
            q = f.write(f'{random.randrange(1,10000)}\n')

asyncio makes little sense for local files. asyncio 对本地文件意义不大。 That is the reason, even python standard library does not have them.这就是原因,即使是python标准库也没有它们。

async for line in fin:

Consider the above line.考虑上面的行。 The event loop pauses the co-routine for every line read and executes some other co-routine.事件循环会在读取每一行时暂停协程,并执行其他一些协程。 Which means the following lines of the file in the cpu cache are just thrown away to make space for the next co-routine.这意味着 cpu 缓存中文件的以下行将被丢弃,以便为下一个协程腾出空间。 (They will still be in RAM though). (尽管它们仍将在 RAM 中)。

When should aiofiles be used?什么时候应该使用 aiofiles?

Consider you already use async code in your program and occasionally you have to do some file processing.考虑到您已经在程序中使用了异步代码,并且偶尔您必须进行一些文件处理。 If file processing was done in the same event loop, all the other co-routines are going to be blocked.如果文件处理在同一个事件循环中完成,则所有其他协程都将被阻止。 In that case you can either use aiofiles or do the processing in a different executor.在这种情况下,您可以使用 aiofiles 或在不同的执行程序中进行处理。

If all the program is doing is just reading from files.如果程序所做的只是从文件中读取。 It will be faster to do them sequentially so that it makes good use of cache.顺序执行它们会更快,以便充分利用缓存。 Jumping from one file to another is like an thread context switch and should make it slower.从一个文件跳转到另一个文件就像一个线程上下文切换,应该让它变慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM