简体   繁体   English

Python - 加速置换列表的迭代(使用多线程?)

[英]Python - speed up iterations over permutation list (with multithreading?)

I have generated permutations list with the itertools.permutation (all permutations) or numpy.permuted (part of all permutations) functions in python, depending on how big all permutations are.我在 python 中使用 itertools.permutation(所有排列)或 numpy.permuted(所有排列的一部分)函数生成了排列列表,具体取决于所有排列的大小。 This part of the code is ok and works well and quickly.这部分代码没问题,运行良好且速度很快。

However, the iterator list is big enough (100k or bigger) and I would like to go through it with multiple threads but don't really know how to accomplish that.但是,迭代器列表足够大(100k 或更大),我想通过多个线程 go 但真的不知道如何实现。

Here is what I have so far.这是我到目前为止所拥有的。 The chunk of code is working but in an inefficient way, it takes a long time to complete the task.代码块正在运行,但效率低下,需要很长时间才能完成任务。 I have tried to use multiprocessing.Pool() function, but I have not been able to make it work我曾尝试使用 multiprocessing.Pool() function,但无法使其正常工作

avg = {'Block': np.arange(1, 17, 1),
        'A': np.random.uniform(0,1,16),
        'B': np.random.uniform(0,1,16),
        'C': np.random.uniform(0,1,16)
}
avg = pd.DataFrame(avg)
seed = 1234
thr = 100000

def permuta (avg, seed, thr):
    kpi = []

    # Permutations
    if len(pd.unique(avg.Block)) > 9:
        rng = np.random.default_rng(seed)     
        perm = rng.permuted(np.tile(pd.unique((avg.Block)-1).astype(int), thr).reshape(thr, (pd.unique(avg.Block)-1).size), axis=1)
        aa = list(perm)
        aa1 = [tuple(x) for x in aa]
        bb = [np.arange(0, len(avg))]
        bb1 = tuple(bb[0])
                
        if bb1 not in aa1:
            bb.extend(aa)
            aa = [tuple(x) for x in bb]
        else:
            aa = [tuple(x) for x in aa]
    else:
        perm = permutations(avg.index)
        aa = list(perm)

    n0 = len(aa)
    for i in aa:
        if (aa.index(i)+1) % 1000 == 0:
            print('    progress: {:.2f}%'.format((aa.index(i)+1)/n0*100))
        df = avg.loc[list(i)]
        df.reset_index(drop=True, inplace=True)
        
        model_A = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.A)
        model_B = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.B)
        model_C = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.C)
        block_order_id = tuple(x+1 for x in i)
        model_kpi = [block_order_id, model_A.coef_[0], model_B.coef_[0], model_C.coef_[0]]
        
        kpi.append(model_kpi)
    
    kpi = pd.DataFrame (kpi, columns = ['Block_ord', 'm_A', 'm_B', 'm_C'])
      
    return kpi

I would be grateful if someone could help me to speed up the code execution, using all cores for calculations, replacing the "for" loop for a more efficient iterator, or a mix of them.如果有人可以帮助我加快代码执行速度、使用所有内核进行计算、将“for”循环替换为更高效的迭代器或将它们混合使用,我将不胜感激。

Thanks for your help谢谢你的帮助

I usually use concurrent.futures as described https://docs.python.org/3/library/concurrent.futures.html我通常按照 https://docs.python.org/3/library/concurrent.futures.html所述使用 concurrent.futures

Since you mentioned having trouble getting some implementations set up, here's an rough example pasted below that you just need to tweak a bit.由于您提到在设置某些实现时遇到问题,下面粘贴了一个粗略的示例,您只需要稍微调整一下即可。

In this example, I'm using ThreadPoolExecutor which you can try out and tune to your machine.在此示例中,我使用的是 ThreadPoolExecutor,您可以试用它并调整到您的机器上。 However, if you want to start using different processes then you can use ProcessPoolExecutor which has similar syntax and is also explained in the doc link I posted但是,如果您想开始使用不同的进程,那么您可以使用 ProcessPoolExecutor ,它具有类似的语法并且也在我发布的文档链接中进行了解释

 import concurrent.futures avg = {'Block': np.arange(1, 17, 1), 'A': np.random.uniform(0, 1, 16), 'B': np.random.uniform(0, 1, 16), 'C': np.random.uniform(0, 1, 16) } avg = pd.DataFrame(avg) seed = 1234 thr = 100000 def permuta(avg, seed, thr): kpi = [] # Permutations if len(pd.unique(avg.Block)) > 9: rng = np.random.default_rng(seed) perm = rng.permuted( np.tile(pd.unique((avg.Block) - 1).astype(int), thr).reshape(thr, (pd.unique(avg.Block) - 1).size), axis=1) aa = list(perm) aa1 = [tuple(x) for x in aa] bb = [np.arange(0, len(avg))] bb1 = tuple(bb[0]) if bb1 not in aa1: bb.extend(aa) aa = [tuple(x) for x in bb] else: aa = [tuple(x) for x in aa] else: perm = permutations(avg.index) aa = list(perm) n0 = len(aa) with concurrent.futures.ThreadPoolExecutor(max_workers=24) as executor: futures = {executor.submit(everything_in_for_loop, i, other_args_needed_for_this_method): i for i in aa} error_count = 0 results = [] # the results will be placed into this array for future in futures: try: result = future.result() if result is not None: results.append(result) except Exception as exc: error_count += 1 print(type(exc)) print(exc.args) kpi = pd.DataFrame(kpi, columns=['Block_ord', 'm_A', 'm_B', 'm_C']) return kpi def everything_in_for_loop(i, other_args_needed_for_this_method): if (aa.index(i) + 1) % 1000 == 0: print(' progress: {:.2f}%'.format((aa.index(i) + 1) / n0 * 100)) df = avg.loc[list(i)] df.reset_index(drop=True, inplace=True) model_A = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1, 1), df.A) model_B = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1, 1), df.B) model_C = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1, 1), df.C) block_order_id = tuple(x + 1 for x in i) model_kpi = [block_order_id, model_A.coef_[0], model_B.coef_[0], model_C.coef_[0]] kpi.append(model_kpi)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM