[英]I need to impute the missing values of a categorical variable based on the values in second categorical variable using pandas dataframe
Impute the missing values in first categorical variable(values- [0,1]) in such a way that if the values in second categorical variable(values- [1, 2, 3]) is either 2 or 3 then set the missing value in that row for first column to be 1 else set it to 0.以这样一种方式估算第一个分类变量(值- [0,1])中的缺失值,如果第二个分类变量(值- [1, 2, 3])中的值是 2 或 3,则设置缺失值在该行中,第一列为 1,否则将其设置为 0。
problem-问题-
Col A可乐 | Col B B栏 |
---|---|
0 0 | 1 1 |
1 1 | 2 2 |
NaN钠 | 3 3 |
NaN钠 | 2 2 |
NaN钠 | 1 1 |
0 0 | 1 1 |
Expected-预期的-
Col A可乐 | Col B B栏 |
---|---|
0 0 | 1 1 |
1 1 | 2 2 |
1 1 | 3 3 |
1 1 | 2 2 |
0 0 | 1 1 |
0 0 | 1 1 |
Use Series.fillna
for replace missing values by 1
if 2,3
tested by Series.isin
and converted to integers for True, False
to 1,0
mapping:如果Series.isin
测试了2,3
并转换为 True 的整数,则使用Series.fillna
将缺失值替换为1
True, False
映射为1,0
:
df['Col A'] = df['Col A'].fillna(df['Col B'].isin([2,3]).astype(int))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.