简体   繁体   English

如果多列是 NA 则删除一行 - R 解决方案

[英]Deleting a line if multiple columns are NA - R solution

I want to delete rows only when selected columns are NA.我只想在所选列为 NA 时删除行。

Data here:这里的数据:

dput(df)
structure(list(record_id = c("BIV-1601-1250-E1", "BIV-1601-1250-E1", 
"BIV-1601-1250-E1", "BIV-1601-1250-E1", "BIV-1601-1250-E1", "BIV-1601-1719-E1", 
"BIV-1601-1719-E1", "BIV-1601-1719-E1", "BIV-1601-1719-E1", "BIV-1601-1719-E1", 
"BIV-1402-1368-E1", "BIV-1402-1368-E1", "BIV-1402-1368-E1", "BIV-1402-1368-E1", 
"BIV-1402-1368-E1", "BIV-1101-1038-E1", "BIV-1101-1038-E1", "BIV-1101-1038-E1", 
"BIV-1101-1038-E1", "BIV-1101-1038-E1", "BIV-1701-1145-E1", "BIV-1701-1145-E1", 
"BIV-1701-1145-E1", "BIV-1701-1145-E1", "BIV-1701-1145-E1", "BIV-1102-2040-E1", 
"BIV-1102-2040-E1", "BIV-1102-2040-E1", "BIV-1102-2040-E1", "BIV-1102-2040-E1"
), DATE = structure(c(NA, 17478, 17480, 17479, NA, 18295, NA, 
18296, 18296, NA, NA, 17912, 17914, 17934, NA, 17221, 17221, 
17223, 17224, NA, NA, 17820, 17822, 17823, NA, NA, 18359, 18361, 
18361, NA), class = "Date"), haemoglobin = structure(c(NA, 101, 
NA, NA, NA, 100, NA, NA, NA, NA, NA, 97.6, NA, NA, NA, NA, 109, 
NA, NA, NA, NA, 120, NA, NA, NA, NA, 205, NA, NA, NA), label = "g/L", class = c("labelled", 
"numeric")), WBC = structure(c(NA, NA, "5", NA, NA, NA, "27.6", 
NA, NA, NA, NA, NA, "8.8", NA, NA, NA, NA, "10.3", NA, NA, NA, 
NA, "23.5", NA, NA, NA, NA, "11.81", NA, NA), label = "10^9/L", class = c("labelled", 
"character")), CRP = c(NA, NA, "9", NA, NA, NA, "499", NA, NA, 
NA, NA, NA, "7", NA, NA, NA, "43", "54.4", NA, NA, NA, NA, "37", 
NA, NA, NA, NA, "<4.0", NA, NA), admission_day = c(NA, 0L, 2L, 
1L, NA, 1L, NA, 2L, 2L, NA, NA, 1L, 3L, 23L, NA, 0L, 0L, 2L, 
3L, NA, NA, 0L, 2L, 3L, NA, NA, 0L, 2L, 2L, NA)), row.names = c(NA, 
-30L), groups = structure(list(record_id = c("BIV-1101-1038-E1", 
"BIV-1102-2040-E1", "BIV-1402-1368-E1", "BIV-1601-1250-E1", "BIV-1601-1719-E1", 
"BIV-1701-1145-E1"), .rows = structure(list(16:20, 26:30, 11:15, 
    1:5, 6:10, 21:25), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 6L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

I only want to drop the lines when the following columns DATE , haemoglobin , CRP , WBC , and admission_day all equal NA.我只想在以下列DATEhaemoglobinCRPWBCadmission_day都等于 NA 时删除这些行。 My thoughts were something like this:我的想法是这样的:

library(dplyr)
cols_to_drop <- c("DATE", "haemoglobin", "CRP", "WBC", "admission_day")
df <- df %>% mutate(case_when(is.na(cols_to_drop) ~ drop_na(DATE)))

Obviously (as usual for me) this doesn't work... II think it's something to do with needing to make case_when equal to a particular variable... but I want it to apply across the whole dataframe.显然(像往常一样对我来说)这不起作用......我认为这与需要使case_when等于特定变量有关......但我希望它适用于整个 dataframe。

If someone can help, I'd be grateful!如果有人可以提供帮助,我将不胜感激!

You can use if_all / if_any -您可以使用if_all / if_any -

library(dplyr)

cols_to_drop <- c("DATE", "haemoglobin", "CRP", "WBC", "admission_day")
df %>% filter(!if_all(cols_to_drop, is.na))

With if_any -使用if_any -

df %>% filter(if_any(cols_to_drop, Negate(is.na)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM