简体   繁体   中英

Recode NA when another column value is NA in R

I have a quick recoding question. Here is my sample dataset looks like:

df <- data.frame(id = c(1,2,3),
                 i1 = c(1,NA,0),
                 i2 = c(1,1,1))

> df
  id i1 i2
1  1  1  1
2  2 NA  1
3  3  0  1

When, i1==NA , then I need to recode i2==NA . I tried below but not luck.

df %>%
  mutate(i2 = case_when(
    i1 == NA ~  NA_real_,
    TRUE ~ as.character(i2)))

Error in `mutate()`:
! Problem while computing `i2 = case_when(i1 == "NA" ~ NA_real_, TRUE ~ as.character(i2))`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

my desired output looks like this:

> df
  id i1 i2
1  1  1  1
2  2 NA  NA
3  3  0  1

Here is an option:

t(apply(df, 1, \(x) if (any(is.na(x))) cumsum(x) else x))
#     id i1 i2
#[1,]  1  1  1
#[2,]  2 NA NA
#[3,]  3  0  1

The idea is to calculate the cumulative sum of every row, if a row contains an NA ; if there is an NA in term i , subsequent terms i+1 will also be NA (since eg NA + 1 = NA ). Since your sample data df is all numeric, I recommend using a matrix (rather than a data.frame ). Matrix operations are usually faster than data.frame (ie list ) operations.

Key assumptions:

  1. id cannot be NA .
  2. This replaces NA s in i2 based on an NA in i1 per row .

A tidyverse solution

I advise against a tidyverse solution here for a couple of reasons

  1. Your data is all-numerical, so a matrix is a more suitable data structure than a data.frame / tibble .
  2. dplyr / tidyr syntax usually operates efficiently on columns; as soon as you want to do things "row-wise", dplyr (and its family packages) might not be the best way (despite dplyr::rowwise() which just introduces a row number-based grouping).

With that out of the way, you can transpose the problem.

library(tidyverse)
df %>%
    transpose() %>%
    map(~ { if (is.na(.x$i1)) .x$i2 <- NA_real_; .x }) %>%
    transpose() %>%
    as_tibble() %>%
    unnest(everything())
## A tibble: 3 × 3
#     id    i1    i2
#  <dbl> <dbl> <dbl>
#1     1     1     1
#2     2    NA    NA
#3     3     0     1

Would a simple assignment meet your requirements for this?

df$i2[is.na(df$i1)] <- NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM