[英]Set specific values of a group of variables to NA based on another group of variables
I could use some help with a tidyverse solution to this question.我可以使用一些帮助来解决这个问题。
I'm working with a large dataset that has 20+ binary cancer outcomes (cancer_{cancertype}), as well as corresponding ages ({cancertype}_age).我正在使用一个包含 20 多种二元癌症结果 (cancer_{cancertype}) 以及相应年龄 ({cancertype}_age) 的大型数据集。 Some individuals are missing cancer phenotype information - I would like to set the age variables for each cancer type to NA if the cancer phenotype is missing .
有些人缺少癌症表型信息 -如果癌症表型缺失,我想将每种癌症类型的年龄变量设置为 NA 。 I've been trying to implement mutate(across()), but am having some issues specifying the appropriate arguments.
我一直在尝试实现 mutate(across()),但是在指定适当的参数时遇到了一些问题。
# load tidyverse lib
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
cancer_a = rep(0:1, length = 10),
cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)),
cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)),
a_age = sample(30:60, 10, FALSE),
b_age = sample(30:60, 10, FALSE),
c_age = sample(30:60, 10, FALSE)
)
cancer_ds
cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )
cancer_list
# attempted code
out_ds <- cancer_ds %>%
mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))
# expected output dataset
out_ds_exp <- cancer_ds %>%
mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age),
c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))
out_ds_exp
Any help is appreciated!任何帮助表示赞赏! Thanks.
谢谢。
Here is an option.这是一个选项。
cancer_ds %>%
rename_with(~ str_replace_all(.x, "([a-z])_([a-z]{2,})", "\\2_\\1")) %>%
pivot_longer(-id, names_to = c(".value", "grp"), names_sep = "_") %>%
mutate(age = if_else(is.na(cancer), NA_integer_, age)) %>%
pivot_wider(names_from = grp, values_from = c(cancer, age))
## A tibble: 10 x 7
# id cancer_a cancer_b cancer_c age_a age_b age_c
# <int> <dbl> <dbl> <dbl> <int> <int> <int>
# 1 1000 0 0 0 46 33 54
# 2 1001 1 0 0 34 54 56
# 3 1002 0 0 1 30 34 33
# 4 1003 1 NA 1 54 NA 34
# 5 1004 0 NA 0 39 NA 42
# 6 1005 1 1 0 33 55 57
# 7 1006 0 NA NA 47 NA NA
# 8 1007 1 1 NA 60 44 NA
# 9 1008 0 1 NA 44 32 NA
#10 1009 1 1 NA 36 38 NA
Explanation: We first fix the inconsistent column names using rename_with
: you have both "<what>_<group>"
(eg "cancer_a") and "<group>_<what>"
(eg "a_age");说明:我们首先使用
rename_with
修复不一致的列名:您同时拥有"<what>_<group>"
(例如“cancer_a”)和"<group>_<what>"
(例如“a_age”); then it's a simple matter of reshaping multiple paired columns from wide to long.那么将多个成对的列从宽变为长就很简单了。 We can then replace
age
values with NA
s if cancer
is NA
before reshaping back from long to wide.然后,如果
cancer
是NA
,我们可以用NA
替换age
值,然后再从长变回宽。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.