简体   繁体   English

根据另一组变量将一组变量的特定值设置为NA

[英]Set specific values of a group of variables to NA based on another group of variables

I could use some help with a tidyverse solution to this question.我可以使用一些帮助来解决这个问题。

I'm working with a large dataset that has 20+ binary cancer outcomes (cancer_{cancertype}), as well as corresponding ages ({cancertype}_age).我正在使用一个包含 20 多种二元癌症结果 (cancer_{cancertype}) 以及相应年龄 ({cancertype}_age) 的大型数据集。 Some individuals are missing cancer phenotype information - I would like to set the age variables for each cancer type to NA if the cancer phenotype is missing .有些人缺少癌症表型信息 -如果癌症表型缺失,我想将每种癌症类型的年龄变量设置为 NA I've been trying to implement mutate(across()), but am having some issues specifying the appropriate arguments.我一直在尝试实现 mutate(across()),但是在指定适当的参数时遇到了一些问题。

# load tidyverse lib
library(tidyverse)

# Set seed for reproducibility
set.seed(42)

# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
           cancer_a = rep(0:1, length = 10), 
           cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)), 
           cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)), 
           a_age = sample(30:60, 10, FALSE), 
           b_age = sample(30:60, 10, FALSE), 
           c_age = sample(30:60, 10, FALSE)
           ) 

cancer_ds

cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )

cancer_list

# attempted code
out_ds <- cancer_ds %>% 
          mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))

# expected output dataset 
out_ds_exp <- cancer_ds %>% 
          mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age), 
                 c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))

out_ds_exp

Any help is appreciated!任何帮助表示赞赏! Thanks.谢谢。

Here is an option.这是一个选项。

cancer_ds %>%
    rename_with(~ str_replace_all(.x, "([a-z])_([a-z]{2,})", "\\2_\\1")) %>%
    pivot_longer(-id, names_to = c(".value", "grp"), names_sep = "_") %>%
    mutate(age = if_else(is.na(cancer), NA_integer_, age)) %>%
    pivot_wider(names_from = grp, values_from = c(cancer, age))
## A tibble: 10 x 7
#      id cancer_a cancer_b cancer_c age_a age_b age_c
#   <int>    <dbl>    <dbl>    <dbl> <int> <int> <int>
# 1  1000        0        0        0    46    33    54
# 2  1001        1        0        0    34    54    56
# 3  1002        0        0        1    30    34    33
# 4  1003        1       NA        1    54    NA    34
# 5  1004        0       NA        0    39    NA    42
# 6  1005        1        1        0    33    55    57
# 7  1006        0       NA       NA    47    NA    NA
# 8  1007        1        1       NA    60    44    NA
# 9  1008        0        1       NA    44    32    NA
#10  1009        1        1       NA    36    38    NA

Explanation: We first fix the inconsistent column names using rename_with : you have both "<what>_<group>" (eg "cancer_a") and "<group>_<what>" (eg "a_age");说明:我们首先使用rename_with修复不一致的列名:您同时拥有"<what>_<group>" (例如“cancer_a”)和"<group>_<what>" (例如“a_age”); then it's a simple matter of reshaping multiple paired columns from wide to long.那么将多个成对的列从宽变为长就很简单了。 We can then replace age values with NA s if cancer is NA before reshaping back from long to wide.然后,如果cancerNA ,我们可以用NA替换age值,然后再从长变回宽。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM