[英]How to parallelize or use multi-cores to speed up a while loop?
我有一個帶有 16 核處理器的實例,我有一個如下所示的 while 循環,
count = 200000
num = 0
pbar = tqdm(total=count)
lst = []
while num <= count:
random_folder = os.path.join(path, np.random.choice(os.listdir(path)))
file_path = os.path.join(path, np.random.choice(os.listdir(random_folder)))
if not os.path.isdir(file_path):
lst.append(file_path)
pbar.update(1)
num += 1
當我嘗試在服務器上運行這段代碼時,估計時間真的很長
0%| | 138/200000 [02:14<51:25:11, 1.08it/s]
我曾嘗試使用 numpy 來獲得隨機選擇,但它仍然很慢。 有什么辦法可以利用我的多核 cpu 並加速這個 while 循環? 它只是從子文件夾中收集隨機文件。 非常感謝任何幫助。 謝謝
更新:
path = "/home/user12/pdf_files"
def get_random_file(num_of_files):
count = 0
random_files = []
while count < num_of_files:
random_folder = os.path.join(path, random.choice(os.listdir(path)))
file_path = os.path.join(path, random.choice(os.listdir(random_folder)))
if not os.path.isdir(file_path):
resumes_list.append(file_path)
count += 1
return random_files
with Pool(16) as p:
random_files = p.map(get_random_file, (1000/16,))
您可以使用多處理並同時使用所有內核。
請參閱https://docs.python.org/3.8/library/multiprocessing.html
像這樣的東西:
from multiprocessing import Pool
def get_random_file(num_of_files):
# your logic goes here
count = 0
random_files = []
while count < num_of_files:
count += 1
pass
#get random file and append to 'random_files'
return random_files
if __name__ == '__main__':
with Pool(16) as p:
num_of_files = [200000/16 for i in range(1,16)]
random_files = p.map(get_random_file,num_of_files)
# random_files is a list of lists - you need to merge them into one list
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.