简体   繁体   English

根据R中的其他变量有条件地填充缺失数据

[英]Conditionally filling missing data based on other variables in R

enter image description here在此处输入图像描述

sorry for adding the screenshot, I download data from https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction抱歉添加屏幕截图,我从https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction下载数据

Can someone inform me about the way to fill those NA values that the occupation column has?有人可以告诉我填充职业列所具有的那些 NA 值的方法吗? I create a new variable to determine whether an applicant is working or not and I want to fill NA values as zero if the same observation is zero in is_working column and left the others NA.我创建了一个新变量来确定申请人是否在工作,并且如果在 is_working 列中相同的观察值为零并且留下其他 NA,我想将 NA 值填充为零。

df <- data.frame (occupation  = c("NA","NA","Drivers","Accountants","NA","Drivers","Laborers","Cleaning staff","Drivers","Drivers"),
                  is_working = c("1","0","1","1","1","1","1","1","1","1")
                  )
library(dplyr)
df %>%
  mutate(
    # change string "NA" to missing values NA
    occupation = ifelse(occupation == "NA", NA, occupation),
    # replace NAs where is_working is 0 with 0
    occupation = ifelse(is.na(occupation) & is_working == 0, "0", occupation)
  )
#        occupation is_working
# 1            <NA>          1
# 2               0          0
# 3         Drivers          1
# 4     Accountants          1
# 5            <NA>          1
# 6         Drivers          1
# 7        Laborers          1
# 8  Cleaning staff          1
# 9         Drivers          1
# 10        Drivers          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM