简体   繁体   中英

How to change values in multiple columns using the across function in R?

I have a dataframe where I would like to go through all columns that end with _qc and if the value is “4”, then set NA to the corresponding column without the _qc suffix.

For example, if the value of a column named chla_adjusted_qc == 4 , then, set the value of chla_adjusted to NA.

library(tidyverse)


df <- tibble(
  chla_adjusted = c(100, 2),
  chla_adjusted_qc = c("4", "1"),
  bbp_adjusted = c(0.1, 9999),
  bbp_adjusted_qc = c("2", "4")
)

df
#> # A tibble: 2 × 4
#>   chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#>           <dbl> <chr>                   <dbl> <chr>          
#> 1           100 4                         0.1 2              
#> 2             2 1                      9999   4

The desired output would be

tibble(
  chla_adjusted = c(NA, 2),
  chla_adjusted_qc = c("4", "1"),
  bbp_adjusted = c(0.1, NA),
  bbp_adjusted_qc = c("2", "4")
)
#> # A tibble: 2 × 4
#>   chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#>           <dbl> <chr>                   <dbl> <chr>          
#> 1            NA 4                         0.1 2              
#> 2             2 1                        NA   4

What I have done so far was to grab the current column name and find the corresponding column in which I want to set the NA value.

df |>
  mutate(across(ends_with("_qc"), \(var) {
    # If var is chla_adjusted_qc, then lets modify the value in chla_adjusted
    col <- str_remove(cur_column(), "_qc")

    # if (var == "4") {
    #   # What to do here?
    # }
  }))
#> # A tibble: 2 × 4
#>   chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#>           <dbl> <chr>                   <dbl> <chr>          
#> 1           100 chla_adjusted             0.1 bbp_adjusted   
#> 2             2 chla_adjusted          9999   bbp_adjusted

Thank you.

Created on 2022-12-20 with reprex v2.0.2

df %>%
  mutate(across(ends_with("_qc"),
                ~ replace(cur_data()[[ sub("_qc$", "", cur_column()) ]], . == 4L, NA),
                .names = "{sub('_qc$', '', .col)}"))
# # A tibble: 2 × 4
#   chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#           <dbl> <chr>                   <dbl> <chr>          
# 1            NA 4                         0.1 2              
# 2             2 1                        NA   4              

Base R solution:

for(v in grep("_qc$",names(df), value=TRUE)){
  df[[sub("_qc$","",v)]][df[[v]]==4] <- NA
}


> df
# A tibble: 2 × 4
  chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
          <dbl> <chr>                   <dbl> <chr>          
1            NA 4                         0.1 2              
2             2 1                        NA   4              
> 

We could use across2 from dplyover

library(dplyover)
df %>% 
   mutate(across2(ends_with('adjusted'), ends_with('_qc'), 
    ~ case_when(.y !=4 ~ .x ), .names = "{xcol}"))

-output

# A tibble: 2 × 4
  chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
          <dbl> <chr>                   <dbl> <chr>          
1            NA 4                         0.1 2              
2             2 1                        NA   4         

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM