简体   繁体   English

使用字典参考另一列值来映射熊猫数据框的一列中的缺失值

[英]mapping missing values in one column of pandas dataframe using dictionary with reference to another column values

I have a dataframe as 我有一个数据框

> print(df)
[Out:]
activity-code    activity
-------------------------
0                unknown
99               NaN
84               sports
72;99            NaN
57               recreational
57;99;11         NaN
11               NaN

and a dictionary with activity-codes as keys, 还有一本以活动代码为键的字典,

> print(act_dict)
[Out:]
{10: 'unknown',
11: 'cultural',
57: 'recreational',
72: 'social service',
84: 'sports',
99: 'education'}

All the values inside the dataframe are stored as strings even the activity-code has values as string. 数据框内的所有值都存储为字符串,即使活动代码的值也为字符串。 Whereas the dictionary keys are of integer type I want to somehow map and replace with missing values in activity using the dictionary with reference to the values stored in activity-code column. 鉴于字典键是整数类型,我想参考字典中存储在活动代码列中的值,以某种方式映射并替换活动中缺少的值。 So the desired output dataframe should be something like this, 因此,所需的输出数据帧应该是这样的,

> print(df)
[Out:]
activity-code    activity
-------------------------
0                unknown
99               education
84               sports
72;99            social service;education
57               recreational
57;99;11         recreational;education;cultural
11               cultural

This is what I've tried so far, 到目前为止,这是我尝试过的

df['new-activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))

but I'm getting KeyError for single values where the activity-codes aren't single code values. 但对于活动代码不是单个代码值的单个值,我得到了KeyError。 The error says KeyError: 0 错误显示KeyError: 0

How do i map the dictionary values to the missing values in activity column of dataframe? 如何将字典值映射到数据框活动列中的缺失值?

Use apply and str.split , than in apply , use a list comprehension and join it by ';' 使用applystr.split ,而不是apply ,使用列表str.split并以';' str.split :

df['activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))

And now: 现在:

print(df)

Output: 输出:

  activity-code                         activity
0             0                          unknown
1            99                        education
2            84                           sports
3         72;99         social service;education
4            57                     recreational
5      57;99;11  recreational;education;cultural
6            11                         cultural

如果您的字典中没有针对0的值,则可以使用filter():

df['activity']= df['activity-code'].apply(lambda x:'; '.join(list(filter(None,map(act_dict.get,list(map(int,x.split(';'))))))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:使用其他数据框列映射一列值 - Pandas : Mapping one column values using other dataframe column 将一列的值映射到 pandas dataframe 中另一列的值 - Mapping values from one column to the values from another column in a pandas dataframe 使用 pandas Dataframe 中的列中的值作为键并将另一列中的所有值作为值制作字典 - Making a dictionary using values in a column in pandas Dataframe as keys and having all values in another column as values 使用字典为列值过滤pandas数据帧 - Filtering pandas dataframe using dictionary for column values 通过使用另一列中的值来查找字典中的值,将新列添加到Pandas DataFrame - Add a new column to a Pandas DataFrame by using values in another column to lookup values in a dictionary pandas数据框根据另一数据框中的值将值追加到一列 - pandas dataframe append values to one column based on the values in another dataframe 使用 pandas dataframe 中的值作为另一个的列名 - Using values in a pandas dataframe as column names for another 用另一列 Pandas DataFrame 替换一列中的值 - Replace values from one column with another column Pandas DataFrame 通过将列值映射到标题在熊猫中创建字典 - Creating a dictionary in pandas by mapping column values to headers 根据另一个中的值将值添加到pandas数据帧的一列中 - Add values to one column of a pandas dataframe based on the values in another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM