[英]Conditionally filling missing data based on other variables in R
enter image description here在此处输入图像描述
sorry for adding the screenshot, I download data from https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction抱歉添加屏幕截图,我从https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction下载数据
Can someone inform me about the way to fill those NA values that the occupation column has?有人可以告诉我填充职业列所具有的那些 NA 值的方法吗? I create a new variable to determine whether an applicant is working or not and I want to fill NA values as zero if the same observation is zero in is_working column and left the others NA.
我创建了一个新变量来确定申请人是否在工作,并且如果在 is_working 列中相同的观察值为零并且留下其他 NA,我想将 NA 值填充为零。
df <- data.frame (occupation = c("NA","NA","Drivers","Accountants","NA","Drivers","Laborers","Cleaning staff","Drivers","Drivers"),
is_working = c("1","0","1","1","1","1","1","1","1","1")
)
library(dplyr)
df %>%
mutate(
# change string "NA" to missing values NA
occupation = ifelse(occupation == "NA", NA, occupation),
# replace NAs where is_working is 0 with 0
occupation = ifelse(is.na(occupation) & is_working == 0, "0", occupation)
)
# occupation is_working
# 1 <NA> 1
# 2 0 0
# 3 Drivers 1
# 4 Accountants 1
# 5 <NA> 1
# 6 Drivers 1
# 7 Laborers 1
# 8 Cleaning staff 1
# 9 Drivers 1
# 10 Drivers 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.