[英]New_target column based on 3rd value
I have a Dataframe:我有一个 Dataframe:
source target
jan feb
mar apr
jun
feb aug
apr jul
oct dec
aug nov
dec may
The output dataframe would be: output dataframe 将是:
source target new_target
jan feb aug
mar apr jul
jun
feb aug nov
apr jul jul
oct dec may
aug nov nov
dec may may
So the new_target
column will have 3rd value: ie (trace followed between source and target jan->feb->aug->nov
, since aug
is 3rd value, it is the output in new_target
column)所以new_target
列将有第三个值:即(源和目标之间的跟踪jan->feb->aug->nov
,因为aug
是第三个值,它是new_target
列中的 output )
Edit:编辑:
source target new_target
jan feb aug
mar apr jul
jun
feb aug nov
apr jul
oct dec may
aug nov
dec may
Use Series.map
with Series
created by DataFrame.set_index
and then Series.fillna
:将Series.map
与由DataFrame.set_index
和Series.fillna
创建的Series
一起使用:
s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s).fillna(df['target'])
print (df)
source target new_target
0 jan feb aug
1 mar apr jul
2 jun
3 feb aug nov
4 apr jul jul
5 oct dec may
6 aug nov nov
7 dec may may
EDIT:编辑:
s = df.set_index(['source'])['target']
#if possible duplicates in source
#s = df.drop_duplicates('source').set_index(['source'])['target']
df['new_target'] = df['target'].map(s)
print (df)
source target new_target
0 jan feb aug
1 mar apr jul
2 jun NaN
3 feb aug nov
4 apr jul NaN
5 oct dec may
6 aug nov NaN
7 dec may NaN
d = df.dropna().set_index('source').target.to_dict()
df['new_target'] = df.target.apply(lambda x: d.get(x,x))
source target new_target
0 jan feb aug
1 mar apr jul
2 jun
3 feb aug nov
4 apr jul jul
5 oct dec may
6 aug nov nov
7 dec may may
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.