[英]Multiprocessing in python loop
I am generating negative pairs with the help of positive pairs.我在正对的帮助下生成负对。 I would like to speed up the process by using all core of the CPU.
我想通过使用 CPU 的所有核心来加速这个过程。 On a single CPU core, it takes almost five days including day and night.
在单个 CPU 内核上,包括白天和黑夜在内,几乎需要五天时间。
I tend to change the below code in multiprocessing.我倾向于在多处理中更改以下代码。 Meanwhile, I have no list of "positives_negatives.csv"
同时,我没有“positives_negatives.csv”列表
if Path("positives_negatives.csv").exists():
df = pd.read_csv("positives_negatives.csv")
else:
for combo in tqdm(itertools.combinations(identities.values(), 2), desc="Negatives"):
for cross_sample in itertools.product(combo[0], combo[1]):
negatives = negatives.append(pd.Series({"file_x": cross_sample[0], "file_y": cross_sample[1]}).T,
ignore_index=True)
negatives["decision"] = "No"
negatives = negatives.sample(positives.shape[0])
df = pd.concat([positives, negatives]).reset_index(drop=True)
df.to_csv("positives_negatives.csv", index=False)
Modified code修改后的代码
def multi_func(iden, negatives):
for combo in tqdm(itertools.combinations(iden.values(), 2), desc="Negatives"):
for cross_sample in itertools.product(combo[0], combo[1]):
negatives = negatives.append(pd.Series({"file_x": cross_sample[0], "file_y": cross_sample[1]}).T,
ignore_index=True)
Used用过的
if Path("positives_negatives.csv").exists():
df = pd.read_csv("positives_negatives.csv")
else:
with concurrent.futures.ProcessPoolExecutor() as executor:
secs = [5, 4, 3, 2, 1]
results = executor.map(multi_func(identities, negatives), secs)
negatives["decision"] = "No"
negatives = negatives.sample(positives.shape[0])
df = pd.concat([positives, negatives]).reset_index(drop=True)
df.to_csv("positives_negatives.csv", index=False)
The best way is to implement Process Pool Executor class and create a separate function.最好的方法是实现Process Pool Executor class 并创建一个单独的 function。 Like you can achieve in this way
就像你可以通过这种方式实现
Libraries图书馆
from concurrent.futures.process import ProcessPoolExecutor
import more_itertools
from os import cpu_count
def compute_cross_samples(x):
return pd.DataFrame(itertools.product(*x), columns=["file_x", "file_y"])
Modified code修改后的代码
if Path("positives_negatives.csv").exists():
df = pd.read_csv("positives_negatives.csv")
else:
with ProcessPoolExecutor() as pool:
# take cpu_count combinations from identities.values
for combos in tqdm(more_itertools.ichunked(itertools.combinations(identities.values(), 2), cpu_count())):
# for each combination iterator that comes out, calculate the cross
for cross_samples in pool.map(compute_cross_samples, combos):
# for each product iterator "cross_samples", iterate over its values and append them to negatives
negatives = negatives.append(cross_samples)
negatives["decision"] = "No"
negatives = negatives.sample(positives.shape[0])
df = pd.concat([positives, negatives]).reset_index(drop=True)
df.to_csv("positives_negatives.csv", index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.