簡體   English   中英

Python asyncio 運行速度較慢

[英]Python asyncio runs slower

我是 Python 和並行執行和異步的新手。 我做錯了嗎? 我的代碼運行速度較慢(或充其量等於)以傳統方式運行腳本所需的時間,沒有 asyncio。

import asyncio, os, time, pandas as pd
start_time = time.time()

async def main():
    coroutines = list()
    for root, dirs, files in os.walk('.', topdown=True):
        for file in files:
            coroutines.append(cleaner(file))
        await asyncio.gather(*coroutines)

async def cleaner(file):
 df = pd.read_csv(file, sep='\n', header=None, engine='python', quoting=3)
 df = df[0].str.strip(' \t"').str.split('[,|;: \t]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'}) 
 df[['email', 'data']].to_csv('x1', sep=':', index=False, header=False, mode='a', compression='gzip')


asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))

您的工作負載似乎是讀取文件 --> 使用 pandas 處理 --> 寫入文件。 這是多處理的理想選擇,因為每個工作項都非常獨立。 像任何阻塞操作一樣,讀/寫文件系統的pandas例程不是 asyncio 的良好候選者,除非您在 asyncio 的線程或進程池中運行它們。

相反,這些多個操作是真正的並行執行的良好候選者,而 asyncio 沒有給你。 (它的線程和進程池也是不錯的選擇)。

import multiprocessing as mp
import os

def walk_all_files(path):
    for root, dirs, files in os.walk('.', topdown=True):
        for file in files:
            yield os.path.join(root, file)

def cleaner(path):
    return "sparkly"

def clean_all(path="."):
    files = list(walk_all_files(path))
    # using cpu*2 assuming that there is a lot of cpu heavy
    # work that can be done by some processes while others
    # wait on I/O. This is only a guess.
    cpu_count = min(len(files), mp.cpu_count()*2)
    with mp.Pool(cpu_count) as pool:
        # assuming processing is fairly long but also kindof random depending on
        # file contents, setting chunksize to 1 so that subprocess gets new work
        # item from parent on each round. You could set it higher to have fewer
        # interactions between parent and worker.
        result = pool.map(cleaner, files, chunksize=1)

if __name__ == "__main__":
    clean_all(".")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM