如何並行化或使用多核來加速 while 循環？

Question

我有一個帶有 16 核處理器的實例，我有一個如下所示的 while 循環，

count = 200000
num = 0

pbar = tqdm(total=count)
lst = []
while num <= count:
    random_folder = os.path.join(path, np.random.choice(os.listdir(path)))
    file_path = os.path.join(path, np.random.choice(os.listdir(random_folder)))
    if not os.path.isdir(file_path):
        lst.append(file_path)
        pbar.update(1)
        num += 1

當我嘗試在服務器上運行這段代碼時，估計時間真的很長

 0%|          | 138/200000 [02:14<51:25:11,  1.08it/s]

我曾嘗試使用 numpy 來獲得隨機選擇，但它仍然很慢。 有什么辦法可以利用我的多核 cpu 並加速這個 while 循環？ 它只是從子文件夾中收集隨機文件。 非常感謝任何幫助。 謝謝

更新：

path = "/home/user12/pdf_files"

def get_random_file(num_of_files):
    count = 0
    random_files = []
    while count <  num_of_files:
        random_folder = os.path.join(path, random.choice(os.listdir(path)))
        file_path = os.path.join(path, random.choice(os.listdir(random_folder)))
        if not os.path.isdir(file_path):
            resumes_list.append(file_path)
            count += 1
    return random_files

with Pool(16) as p:
    random_files = p.map(get_random_file, (1000/16,))

Answer 1

您可以使用多處理並同時使用所有內核。

請參閱https://docs.python.org/3.8/library/multiprocessing.html

像這樣的東西：

from multiprocessing import Pool

def get_random_file(num_of_files):
    # your logic goes here
    count = 0
    random_files = []
    while count <  num_of_files: 
        count += 1
        pass
        #get random file and append to 'random_files'
    return random_files

if __name__ == '__main__':
    with Pool(16) as p:
        num_of_files = [200000/16 for i in range(1,16)]
        random_files = p.map(get_random_file,num_of_files)
        # random_files is a list of lists - you need to merge them into one list

如何並行化或使用多核來加速 while 循環？

問題描述

1 個解決方案

解決方案1
2 2020-08-11 09:15:36

如何並行化或使用多核來加速 while 循環？

問題描述

1 個解決方案

解決方案1 2 2020-08-11 09:15:36

解決方案1
2 2020-08-11 09:15:36