I'm running a linear regression, but many of my observations can be used because some of the values have an NA in the row. I know that if one of a set of variables is entered, then and NA is actually 0. However, if all the values are NA, then the columns do not change. I will include and example because I know this might be confusing.
What I have is something that looks likes this:
df <- data.frame(outcome = c(1, 0, 1, 1, 0),
Var1 = c(1, 0, 1, NA, NA),
Var2 = c(NA, 1, 0, 0, NA),
Var3 = c(0, 1, NA, 1, NA))
For Vars 1-3, the first 4 rows have an NA, but have other entries in other vars. In the last row, however, all values are NA. I know that everything in the last row is NA, but I want the NAs in those first 4 rows to be filled with 0. The desired outcome would look like this:
desired - data.frame(outcome = c(1, 0, 1, 1, 0),
Var1 = c(1, 0, 1, 0, NA),
Var2 = c(0, 1, 0, 0, NA),
Var3 = c(0, 1, 0, 1, NA))
I know there are messy ways I could go about this, but I was wondering what would be the most streamlined process for this?
I hope this makes sense, I know the question is confusing. I can clarify anything if needed.
We can create a logical vector with rowSums
, use that to subset the rows before changing the NA
to 0
i1 <- rowSums(!is.na(df[-1])) > 0
df[i1, -1][is.na(df[i1, -1])] <- 0
-checking with desired
identical(df, desired)
#[1] TRUE
You can use apply
to conditionally replace NA
in certain rows:
data.frame(t(apply(df, 1, function(x) if (all(is.na(x[-1]))) x else replace(x, is.na(x), 0))))
Output
outcome Var1 Var2 Var3
1 1 1 0 0
2 0 0 1 1
3 1 1 0 0
4 1 0 0 1
5 0 NA NA NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.