[英]summing acrossing rows for certain columns, keep NA if all NA
I have clinical data that looks something like this... I have a bunch of different binary outcomes but I only want to sum a few of the outcomes to create a total outcome/composite score.我有看起来像这样的临床数据......我有一堆不同的二元结果,但我只想总结一些结果以创建总结果/复合分数。 My data looks something like this我的数据看起来像这样
``patientid <- c(100,101,102,103,104,105,106)
outcome1 <- c(0,NA,1,0,1,NA,1)
outcome2 <- c(0,1,1,0,0,NA,1)
outcome3 <- c(0,NA,NA,0,1,NA,0)
outcome4 <- c(NA,NA,NA,0,1,NA,0)
Data<-data.frame(patientid=patientid,outcome1=outcome1,outcome2=outcome2,outcome3=outcome3,outcome4=outcome4)
Data''
Now I want to create a composite score for just three of the outcomes.现在我只想为三个结果创建一个综合分数。 NA should count as a zero UNLESS it is NA in every outcome chosen to sum in which case it will stay NA. NA 应该算作零,除非它在每个选择求和的结果中都是 NA,在这种情况下它将保持 NA。 I assume this is done with rowsums?我假设这是用 rowsums 完成的? Here is what my desire database should like (summing just outcome 1, 2, 4)这是我想要的数据库应该是什么样的(仅对结果 1、2、4 求和)
``patientid <- c(100,101,102,103,104,105,106)
outcome1 <- c(0,NA,1,0,1,NA,1)
outcome2 <- c(0,1,1,0,1,NA,1)
outcome3 <- c(0,NA,NA,0,1,NA,0)
outcome4 <- c(NA,NA,NA,0,1,NA,0)
composite <- c(0,1,2,0,3,NA,2)
data.frame(patientid=patientid,outcome1=outcome1,outcome2=outcome2,outcome3=outcome3,outcome4=outcome4, composite= composite)
Data''
Try this approach using c_across()
.使用c_across()
尝试这种方法。 I am a bit confused on why the final output has some columns different from the original output.我对为什么最终输出有一些与原始输出不同的列感到有些困惑。 You can use c_across()
and rowwise()
to sum certain rows and after that flag those with all ``NA`.您可以使用c_across()
和rowwise()
对某些行求和,然后用所有“NA”标记那些行。 Here the code:这里的代码:
library(tidyverse)
#Code
NewData <- Data %>% rowwise(patientid) %>%
mutate(Composite=sum(c_across(c(outcome1,outcome2,outcome4)),na.rm=T)) %>%
mutate(Flag=ifelse(sum(!is.na(c_across(c(outcome1,outcome2,outcome4))))==0,1,0),
Composite=ifelse(Flag==1,NA,Composite)) %>% select(-Flag)
Output:输出:
# A tibble: 7 x 6
# Rowwise: patientid
patientid outcome1 outcome2 outcome3 outcome4 Composite
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 0 0 0 NA 0
2 101 NA 1 NA NA 1
3 102 1 1 NA NA 2
4 103 0 0 0 0 0
5 104 1 0 1 1 2
6 105 NA NA NA NA NA
7 106 1 1 0 0 2
In base R, you can use rowSums
:在基础 R 中,您可以使用rowSums
:
#select the columns that we want to count
cols <- paste0('outcome', c(1:2, 4))
#sum them rowwise
Data$composite <- rowSums(Data[cols], na.rm =TRUE)
#turn all NA rows to NA.
Data$composite[rowSums(!is.na(Data[cols])) == 0] <- NA
Data
# patientid outcome1 outcome2 outcome3 outcome4 composite
#1 100 0 0 0 NA 0
#2 101 NA 1 NA NA 1
#3 102 1 1 NA NA 2
#4 103 0 0 0 0 0
#5 104 1 0 1 1 2
#6 105 NA NA NA NA NA
#7 106 1 1 0 0 2
library(tidyverse)
Data %>%
rowwise() %>%
mutate(
Composite = if_else(
c(outcome1, outcome2, outcome4) %>% is.na() %>% mean() %>% `==`(1), # looking for cases where all columns are NA
NA_real_, # all NA columns produce NA
c(outcome1, outcome2, outcome4) %>% sum(na.rm = T) # for other columns, NAs are treated as 0s
)
)
# patientid outcome1 outcome2 outcome3 outcome4 composite
#1 100 0 0 0 NA 0
#2 101 NA 1 NA NA 1
#3 102 1 1 NA NA 2
#4 103 0 0 0 0 0
#5 104 1 0 1 1 2
#6 105 NA NA NA NA NA
#7 106 1 1 0 0 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.