简体   繁体   English

如何将存储在3个数据帧中的3个机器学习模型的结果合并/集合化,并输出1个数据帧并获得大多数人同意的结果?

[英]How do I combine/ensemble results of 3 machine learning models stored in 3 dataframes and output 1 dataframe with results agreed by majority?

I am currently participating in an online hackathon. 我目前正在参加在线黑客马拉松。 All the top entries are within 1% of each other. 所有排名靠前的条目都在1%以内。 So I decided to run 3 different models instead of a single best performing one, ie ensemble learning, tuned hyperparameters on each one of them and then combine results of all three to get a better model. 因此,我决定运行3个不同的模型,而不是运行一个性能最好的模型,即集成学习,在每个模型上调整超参数,然后将这三个模型的结果合并以获得更好的模型。 I've combined results of all three in a dataframe, it's df.head() is as below: 我将所有三个结果合并到一个数据帧中,它的df.head()如下所示:

index | building_id | rf_damage_grade | xg_damage_grade | lr_damage_grade   | damage_grade

0   a3380c4f75  Grade 4 Grade 2 Grade 3 Grade 4

1   a338a4e653  Grade 5 Grade 5 Grade 5 Grade 5

2   a338a4e6b7  Grade 5 Grade 5 Grade 5 Grade 5

3   a33a6eaa3a  Grade 3 Grade 2 Grade 4 Grade 3

4   a33b073ff6  Grade 5 Grade 5 Grade 5 Grade 5

So 'rf_damage_grade' is the column of my best classifier. 因此,“ rf_damage_grade”是我最好的分类器的一列。 It gives around 74% accuracy, other two give 68% and 58% respectively. 它提供约74%的准确度,其他两个分别提供68%和58%。 In final output i want, if 'xg_damage_grade' and 'lr_damage_grade' both agree on one value the final output 'damage_grade' gets changed to that value, otherwise it remains equal to the output of 'rf_damage_grade'. 在我想要的最终输出中,如果“ xg_damage_grade”和“ lr_damage_grade”都同意一个值,则最终输出“ damage_grade”将更改为该值,否则将保持等于“ rf_damage_grade”的输出。 There are more than 400k rows in the data and and every time I rerun my model it is taking around an hour to do this on my Early 2015 MBP. 数据中有超过40万行,并且每次我重新运行模型时,在2015年初的MBP中都要花一个小时左右。 Following is the code i've written: 以下是我编写的代码:

for i in range(len(final)):
    if final.iloc[i,2]==final.iloc[i,3]:
        final.iloc[i,4]=final.iloc[i,2]
        if final.iloc[i,3]!=final.iloc[i,1]:
            count+=1
    else:
        continue

What can I do to make it more efficient? 我该怎么做才能使其更有效率? Is there any inbuilt function in sklearn to do this sort of thing? sklearn中是否有内置函数可以执行此类操作?

Simply run conditional logic with .loc : 只需使用.loc运行条件逻辑:

df.loc[df['xg_damage_grade'] == df['lr_damage_grade'], 'damage_grade'] = df['xg_damage_grade']
df.loc[df['xg_damage_grade'] != df['lr_damage_grade'], 'damage_grade'] = df['rf_damage_grade']

Or with numpy's where : 或使用numpy的where

df['damage_grade'] = np.where(df['xg_damage_grade'] == df['lr_damage_grade'],
                              df['xg_damage_grade']
                              df['rf_damage_grade'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python中如何结合深度学习模型和传统机器学习模型的分类结果 - How to combine the classification results of deep learning models and traditional machine learning models in python 如何使用搜索其他数据帧的函数的结果填充数据帧? - How do i populate a dataframe with the results of a function that searches other dataframes? scikit-learn 中的机器学习模型集合 - Ensemble of machine learning models in scikit-learn 在多输出分类问题的情况下,如何正确比较机器学习模型的性能? - How do I properly compare performance of machine learning models, in the case of a multi-output classification problem? 如何避免重新训练机器学习模型 - How do I avoid re-training machine learning models 迁移到 Azure 机器学习工作室时,模型会产生不同的结果 - Models generate different results when moving to Azure Machine Learning Studio 烧瓶,我如何获得模型结果? - Flask, how do i get Models results? Python /机器学习:我可以将几种预测模型组合为一个 - Python/Machine Learning: Can I combine several prediction models into one 如何为唯一 ID 运行多个线性模型,并通过唯一 ID 将结果放入单个 dataframe 中? - How do I run multiple linear models for unique IDs and put the results in a single dataframe by the unique IDs? 如何将while循环的结果结合到一个output? - How do i combine the results from a while loop winto one output?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM