I am using a function that take too much time to finish since it takes a large input and use two nested for loops .
The code of the function :
def transform(self, X):
global brands
result=[]
for x in X:
index=0
count=0
for brand in brands:
all_matches= re.findall(re.escape(brand), x,flags=re.I)
count_all_match=len(all_matches)
if(count_all_match>count):
count=count_all_match
index=brands.index(brand)
result.append([index])
return np.array(result)
So how to change the code of this function so that it uses multiprocessing in order to optimize the running time ?
I don't see the use of self
in the method transform
. So i made a common function.
import re
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def transformer(x):
global brands
index = 0
count = 0
for brand in brands:
all_matches = re.findall(re.escape(brand), x, flags=re.I)
count_all_match = len(all_matches)
if count_all_match > count:
count = count_all_match
index = brands.index(brand)
return [index]
def transform(X):
with ProcessPoolExecutor() as executor:
result = executor.map(transformer, X)
return np.array(list(result))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.