繁体   English   中英

R 删除包含相同字符串的列组中具有 NA 的行

[英]R remove rows with NA in groups of columns containing the same string

我有一个 dataframe 包含多个变量,每个变量在两个不同的时间点用多个项目测量。 我想删除包含字符串相同部分的列组中具有 NA 条目的所有行。 其中一些组包含多个列(例如, grep("learn") ,一些只有一个(例如, T1_age 。这是我原来的 dataframe (其中的一部分)):

data <- data.frame(
      T1_age = c(39, 30, 20, 48, 27, 55, 37, 50, 50, 37),
      T1_sex = c(2, 1, 1, 2, 2, 1, 1, 2, 1, 1),
   T2_learn1 = c(2, NA, 3, 4, 1, NA, NA, 2, 4, 4),
   T2_learn2 = c(1, NA, 4, 4, 1, NA, NA, 2, 4, 4),
   T2_learn3 = c(2, NA, 4, 4, 1, NA, NA, 3, 4, 4),
   T2_learn4 = c(2, NA, 2, 5, 5, NA, NA, 5, 5, 5),
   T2_learn5 = c(4, NA, 3, 4, 3, NA, NA, 3, 4, 3),
     T2_aut1 = c(NA, NA, 4, 4, 4, NA, NA, 3, 5, 4),
     T2_aut2 = c(NA, NA, 4, 4, 4, NA, NA, 3, 5, 5),
     T2_aut3 = c(NA, NA, 4, 4, 3, NA, NA, 3, 5, 5),
    T2_ssup1 = c(1, NA, 4, 5, 4, NA, NA, 2, 4, 3),
    T2_ssup2 = c(3, NA, 4, 5, 5, NA, NA, 3, 4, 4),
    T2_ssup3 = c(4, NA, 4, 5, 5, NA, NA, 4, 4, 4),
    T2_ssup4 = c(2, NA, 3, 5, 5, NA, NA, 3, 4, 4),
   T3_learn1 = c(3, NA, NA, 4, 4, NA, NA, 3, 3, 4),
   T3_learn2 = c(1, NA, NA, 4, 3, NA, NA, 3, 3, 4),
   T3_learn3 = c(3, NA, NA, 4, 4, NA, NA, 3, 3, 5),
   T3_learn4 = c(4, NA, NA, 5, 4, NA, NA, 4, 5, 5),
   T3_learn5 = c(4, NA, NA, 3, 4, NA, NA, 3, 3, 4),
     T3_aut1 = c(NA, NA, NA, 4, 4, NA, NA, 3, 5, 5),
     T3_aut2 = c(NA, NA, NA, 3, 4, NA, NA, 3, 5, 5),
     T3_aut3 = c(NA, NA, NA, 3, 2, NA, NA, 3, 5, 5),
    T3_ssup1 = c(3, NA, NA, 5, 4, NA, NA, 2, 4, 1),
    T3_ssup2 = c(3, NA, NA, 5, 5, NA, NA, 4, 5, 5),
    T3_ssup3 = c(4, NA, NA, 5, 5, NA, NA, 4, 5, 3),
    T3_ssup4 = c(3, NA, NA, 5, 5, NA, NA, 4, 5, 4)
)


现在我已经找到了一个非常糟糕的解决方案,我相信可以改进。 所以这段代码基本上是我想要的:

library(dplyr)
library(tidyr)

data <- data %>% filter(rowSums(is.na(.[ , grep("learn", colnames(.))])) != ncol(.[ , grep("learn", colnames(.))]))
data <- data %>% filter(rowSums(is.na(.[ , grep("aut", colnames(.))])) != ncol(.[ , grep("aut", colnames(.))]))
data <- data %>% filter(rowSums(is.na(.[ , grep("ssup", colnames(.))])) != ncol(.[ , grep("ssup", colnames(.))]))
data <- data %>% drop_na(T1_age)
data <- data %>% drop_na(T1_sex)

所以新的数据框(以及我想要实现的)看起来像这样:


data2 <- data.frame(
      T1_age = c(20, 48, 27, 50, 50, 37),
      T1_sex = c(1, 2, 2, 2, 1, 1),
   T2_learn1 = c(3, 4, 1, 2, 4, 4),
   T2_learn2 = c(4, 4, 1, 2, 4, 4),
   T2_learn3 = c(4, 4, 1, 3, 4, 4),
   T2_learn4 = c(2, 5, 5, 5, 5, 5),
   T2_learn5 = c(3, 4, 3, 3, 4, 3),
     T2_aut1 = c(4, 4, 4, 3, 5, 4),
     T2_aut2 = c(4, 4, 4, 3, 5, 5),
     T2_aut3 = c(4, 4, 3, 3, 5, 5),
    T2_ssup1 = c(4, 5, 4, 2, 4, 3),
    T2_ssup2 = c(4, 5, 5, 3, 4, 4),
    T2_ssup3 = c(4, 5, 5, 4, 4, 4),
    T2_ssup4 = c(3, 5, 5, 3, 4, 4),
   T3_learn1 = c(NA, 4, 4, 3, 3, 4),
   T3_learn2 = c(NA, 4, 3, 3, 3, 4),
   T3_learn3 = c(NA, 4, 4, 3, 3, 5),
   T3_learn4 = c(NA, 5, 4, 4, 5, 5),
   T3_learn5 = c(NA, 3, 4, 3, 3, 4),
     T3_aut1 = c(NA, 4, 4, 3, 5, 5),
     T3_aut2 = c(NA, 3, 4, 3, 5, 5),
     T3_aut3 = c(NA, 3, 2, 3, 5, 5),
    T3_ssup1 = c(NA, 5, 4, 2, 4, 1),
    T3_ssup2 = c(NA, 5, 5, 4, 5, 5),
    T3_ssup3 = c(NA, 5, 5, 4, 5, 3),
    T3_ssup4 = c(NA, 5, 5, 4, 5, 4)
        )

你能帮我改进一下吗? 谢谢!!!

下面的代码应该可以工作,因为我通过将随机元素更改为NA进行了测试。 我所做的是将原始数据子集为 T2 和 T3 数据(因为各自的列号相同),然后通过is.na()使用矢量化。

data_T2 <- data %>% subset(select = c(3:14))
data_T3 <- data %>% subset(select = c(15:26))
data[!(is.na(data_T2) & is.na(data_T3))[,1],]

您可以在grep中迭代sapply并检查切片中的rowSums是否达到其列数。

V <- c('learn', 'aut', 'ssup')

res <- data[!rowSums(sapply(V, \(v) rowSums(is.na(data[grep(v, names(data))])))) ==
       dim(data[grep(paste(V, collapse='|'), names(data))])[2], ]

stopifnot(all.equal(res, data2, check.attributes=FALSE))

或者可能只是检查“热”列中NA的总和是否达到列数(没有人口统计)就足够了。

res1 <- data[rowSums(is.na(data[grep(paste(V, collapse='|'), names(data))])) != 
               dim(data[-(1:2)])[2], ]

stopifnot(all.equal(res1, data2, check.attributes=FALSE))

data2是您在 OP 中提供的结果数据框。 dim(data)[2]给出与ncol(data)相同的结果。

注: R version 4.1.2 (2021-11-01)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM