简体   繁体   English

Python,在多个CPU上运行循环

[英]Python, running a loop on several CPUs

I created a small code that works similar to sklearn gridsearch, It trains the model (on X and y in the code below) on one set of hyperparameters, checks the model performance using several metrics on validation data ( Xt, yt_class) and stores the results in the pandas DataFrame. 我创建了一个类似于sklearn gridsearch的小代码,它在一组超参数训练模型(在下面的代码中的X和y上) ,使用验证数据( Xt,yt_class)上的多个指标检查模型的性能并存储会产生pandas DataFrame。

    from sklearn.grid_search import ParameterGrid
    from sklearn.metrics import precision_score,f1_score

    grid = {'C':[1,10.0,50,100.0],'gamma':[0.00001,0.0001,0.001,0.01,0.1]}
    param_grid = ParameterGrid(grid)
    results = pd.DataFrame(list(param_grid))
    precision = []
    f1 = []
    for params in param_grid:
        model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
        model.fit(X,y)
        precision.append(precision_score(yt_class, model.predict(Xt), average='weighted'))
        f1.append(f1_score(yt_class, model.predict(Xt), average='weighted'))
        print(params)
        print(precision_score(yt_class, model.predict(Xt), average='weighted'))
        print(f1_score(yt_class, model.predict(Xt), average='weighted'))

    results['precision'] = precision
    results['f1'] = f1

Now I am trying to make my loop run on several CPUs, I tried following basic examples for multiprocessing module, but being new to Python and programming overall wasn't able to figure out it works in my case. 现在,我试图使我的循环在多个CPU上运行,我尝试了以下有关多处理模块的基本示例,但是对于Python和编程而言,它是新手,因此无法确定它是否适用于我的情况。

Example of what does not work: 不起作用的示例:

import multiprocessing as mp
pool = mp.Pool(processes=8)

def get_scores(param_grid):
    precision = []
    f1 = []
    for params in param_grid:
        model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
        model.fit(X,y)
        model.predict(Xt)
        precision.append(precision_score(yt_class, model.predict(Xt), average='weighted'))
        f1.append(f1_score(yt_class, model.predict(Xt), average='weighted'))
    return precision,f1    

scores = pool.apply(get_scores,param_grid)

Your get_scores method should only consist of the inner part of the loop 您的get_scores方法应仅包含循环的内部部分

Try this: 尝试这个:

import multiprocessing as mp
from sklearn.grid_search import ParameterGrid
from sklearn.metrics import precision_score,f1_score

def get_scores(params):
    model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
    model.fit(X,y)
    model.predict(Xt)
    precision = precision_score(yt_class, model.predict(Xt), average='weighted')
    f1 = f1_score(yt_class, model.predict(Xt), average='weighted')
    return precision, f1    


grid = {'C':[1,10.0,50,100.0],'gamma':[0.00001,0.0001,0.001,0.01,0.1]}
param_grid = ParameterGrid(grid)
pool = mp.Pool(processes=8)

scores = pool.map_async(get_scores, param_grid).get()
# scores is a list of tuples [(precision_1, f1_1), (precision_2, f1_2)...]
# you can "unzip" it like this

precision, f1 = zip(*scores)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM