[英]How to replace nan values of a column based on certain values of other column
I've two columns, col1 refers to level of education and col2 to their job. 我有两列,col1指受教育程度,col2指他们的工作。 col2 have some nan values, so I want to replace this nan values based on the value of column 1. for example if col1='bachelor' then col2 must be ='teacher' if col1='high school' then col2='actor'.. and so on, I have 7 different values of col1. col2具有一些nan值,因此我想根据列1的值替换此nan值。例如,如果col1 ='bachelor',则col2必须为='teacher';如果col1 ='highschool',则col2 ='actor '..依此类推,我有7个不同的col1值。
I've tried to create a function like this: 我试图创建一个像这样的函数:
def rep_nan(x):
if x['col1']=='bachelor':
x['col2']='teacher'
elif x['col1']=='blabla':
x['col2']='blabla'
.....
elif x['col1']='high school':
x['col2']='actor'
then I applied to my dataset: 然后我将其应用于数据集:
df.apply(rep_nan,axis=1)
but I get as result a None column 但结果是无列
where is the error? 错误在哪里? or how could I do this task? 或者我该怎么做?
You can make a dictionary here: 您可以在此处制作字典:
rep_nan = {
'bachelor': 'tacher',
'blabla': 'blabla',
'high school': 'actor'
}
Then we can replace the nan values with: 然后我们可以将nan值替换为:
df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)
For example: 例如:
>>> df
col1 col2
0 bachelor None
1 bachelor clown
2 blabla None
3 high school None
>>> df.loc[df['col2'].isnull(), 'col2'] = df[df['col2'].isnull()]['col1'].replace(rep_nan)
>>> df
col1 col2
0 bachelor tacher
1 bachelor clown
2 blabla blabla
3 high school actor
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.