[英]mapping missing values in one column of pandas dataframe using dictionary with reference to another column values
I have a dataframe as 我有一个数据框
> print(df)
[Out:]
activity-code activity
-------------------------
0 unknown
99 NaN
84 sports
72;99 NaN
57 recreational
57;99;11 NaN
11 NaN
and a dictionary with activity-codes as keys, 还有一本以活动代码为键的字典,
> print(act_dict)
[Out:]
{10: 'unknown',
11: 'cultural',
57: 'recreational',
72: 'social service',
84: 'sports',
99: 'education'}
All the values inside the dataframe are stored as strings even the activity-code has values as string. 数据框内的所有值都存储为字符串,即使活动代码的值也为字符串。 Whereas the dictionary keys are of integer type I want to somehow map and replace with missing values in activity using the dictionary with reference to the values stored in activity-code column.
鉴于字典键是整数类型,我想参考字典中存储在活动代码列中的值,以某种方式映射并替换活动中缺少的值。 So the desired output dataframe should be something like this,
因此,所需的输出数据帧应该是这样的,
> print(df)
[Out:]
activity-code activity
-------------------------
0 unknown
99 education
84 sports
72;99 social service;education
57 recreational
57;99;11 recreational;education;cultural
11 cultural
This is what I've tried so far, 到目前为止,这是我尝试过的
df['new-activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))
but I'm getting KeyError for single values where the activity-codes aren't single code values. 但对于活动代码不是单个代码值的单个值,我得到了KeyError。 The error says
KeyError: 0
错误显示
KeyError: 0
How do i map the dictionary values to the missing values in activity column of dataframe? 如何将字典值映射到数据框活动列中的缺失值?
Use apply
and str.split
, than in apply
, use a list comprehension and join it by ';'
使用
apply
和str.split
,而不是apply
,使用列表str.split
并以';'
str.split
: :
df['activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))
And now: 现在:
print(df)
Output: 输出:
activity-code activity
0 0 unknown
1 99 education
2 84 sports
3 72;99 social service;education
4 57 recreational
5 57;99;11 recreational;education;cultural
6 11 cultural
如果您的字典中没有针对0的值,则可以使用filter():
df['activity']= df['activity-code'].apply(lambda x:'; '.join(list(filter(None,map(act_dict.get,list(map(int,x.split(';'))))))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.