简体   繁体   English

如何基于groupby函数输出向熊猫数据框添加新列?

[英]How can I add a new column to pandas dataframe based on groupby function output?

I have a dataframe1 which contains 500,000 rows. 我有一个dataframe1,其中包含500,000行。 I want to populate the configuration column by finding the Model number in dataframe2, which contains the configurations. 我想通过在dataframe2中找到包含配置的型号来填充配置列。

Dataframe1: 数据框1:

 Model                 Date     Status   Configuration
 A4                    10/2014  Inop      
 A4                    11/2014  Op              
 A4                    11/2014  Op                                     
 G5                    10/2014  Inop                                   
 G5                    11/2014  Inop                                   
 G5                    11/2014  Op                                     
 G8                    10/2014  Op                                     
 G8                    11/2014  Op                                     
 G8                    11/2014  Op                                     
 G8                    10/2014  Inop                                   
 Z2                    11/2014  Op                                     
 Z2                    11/2014  Op                                     

Dataframe2: 数据框2:

 Model              Configuration  
 A4                 ICS   
 G5                 PCS  
 G8                 ICS    
 Z2                 1/2 ICS   

Code I am currently running: 我当前正在运行的代码:

for Model, group in dataframe1.groupby('Model'):
    #gets configuration from dataframe2 
    config = get_configuration(Model)
    #attempt to assign configuration to all columns with that model number in dataframe1
    dataframe1['Config'] = con

This code returns: 此代码返回:

This code groups dataframe1 by model and successfully gets each groups configuration, but I cannot apply that configuration to a new row in dataframe1 for the following result: 该代码按模型将dataframe1分组,并成功获取每个分组配置,但是我无法将该配置应用于dataframe1中的新行以得到以下结果:

 Model                 Date     Status   Configuration
 A4                    10/2014  Inop     ICS   
 A4                    11/2014  Op       ICS     
 A4                    11/2014  Op       ICS     
 G5                    10/2014  Inop     PCS   
 G5                    11/2014  Inop     PCS  
 G5                    11/2014  Op       PCS
 G8                    10/2014  Op       ICS 
 G8                    11/2014  Op       ICS      
 G8                    11/2014  Op       ICS      
 G8                    10/2014  Inop     ICS     
 Z2                    11/2014  Op       1/2 ICS 
 Z2                    11/2014  Op       1/2 ICS

use map 使用map

Dataframe1['Config'] = Dataframe1['Model'].map(Dataframe2.set_index('Model').Config)
Dataframe1

   Model     Date Status   Config
0     A4  10/2014   Inop      ICS
1     A4  11/2014     Op      ICS
2     A4  11/2014     Op      ICS
3     G5  10/2014   Inop  Non ICS
4     G5  11/2014   Inop  Non ICS
5     G5  11/2014     Op  Non ICS
6     G8  10/2014     Op      ICS
7     G8  11/2014     Op      ICS
8     G8  11/2014     Op      ICS
9     G8  10/2014   Inop      ICS
10    Z2  11/2014     Op  1/2 ICS
11    Z2  11/2014     Op  1/2 ICS

尝试pd.merge

Dataframe1.merge(Dataframe2,left_on='Model',right_on='Model',how='left')         

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM