基于第三个值的 New_target 列

Question

I have a Dataframe:我有一个 Dataframe：

source    target              
jan       feb                               
mar       apr                 
jun                       
feb       aug                                            
apr       jul                                            
oct       dec                     
aug       nov       
dec       may

The output dataframe would be: output dataframe 将是：

source    target    new_target              
jan       feb       aug                        
mar       apr       jul                  
jun                              
feb       aug       nov                                     
apr       jul       jul                                           
oct       dec       may              
aug       nov       nov
dec       may       may

So the new_target column will have 3rd value: ie (trace followed between source and target jan->feb->aug->nov , since aug is 3rd value, it is the output in new_target column)所以new_target列将有第三个值：即（源和目标之间的跟踪jan->feb->aug->nov ，因为aug是第三个值，它是new_target列中的 output ）

Edit:编辑：

source    target    new_target              
jan       feb       aug                        
mar       apr       jul                  
jun                              
feb       aug       nov                                     
apr       jul                                                  
oct       dec       may              
aug       nov       
dec       may

Answer 1

Use Series.map with Series created by DataFrame.set_index and then Series.fillna :将Series.map与由DataFrame.set_index和Series.fillna创建的Series一起使用：

s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s).fillna(df['target'])
print (df)
  source target new_target
0    jan    feb        aug
1    mar    apr        jul
2    jun                  
3    feb    aug        nov
4    apr    jul        jul
5    oct    dec        may
6    aug    nov        nov
7    dec    may        may

EDIT:编辑：

s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s)
print (df)
  source target new_target
0    jan    feb        aug
1    mar    apr        jul
2    jun               NaN
3    feb    aug        nov
4    apr    jul        NaN
5    oct    dec        may
6    aug    nov        NaN
7    dec    may        NaN

Answer 2

d = df.dropna().set_index('source').target.to_dict()
df['new_target'] = df.target.apply(lambda x: d.get(x,x))

    source  target  new_target
0   jan     feb     aug
1   mar     apr     jul
2   jun 
3   feb     aug     nov
4   apr     jul     jul
5   oct     dec     may
6   aug     nov     nov
7   dec     may     may

基于第三个值的 New_target 列

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-15 06:23:40

解决方案2
0 2020-05-15 06:29:51

基于第三个值的 New_target 列

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-15 06:23:40

解决方案2 0 2020-05-15 06:29:51

解决方案1
1 已采纳 2020-05-15 06:23:40

解决方案2
0 2020-05-15 06:29:51