[英]Deleting a line if multiple columns are NA - R solution
I want to delete rows only when selected columns are NA.我只想在所选列为 NA 时删除行。
Data here:这里的数据:
dput(df)
structure(list(record_id = c("BIV-1601-1250-E1", "BIV-1601-1250-E1",
"BIV-1601-1250-E1", "BIV-1601-1250-E1", "BIV-1601-1250-E1", "BIV-1601-1719-E1",
"BIV-1601-1719-E1", "BIV-1601-1719-E1", "BIV-1601-1719-E1", "BIV-1601-1719-E1",
"BIV-1402-1368-E1", "BIV-1402-1368-E1", "BIV-1402-1368-E1", "BIV-1402-1368-E1",
"BIV-1402-1368-E1", "BIV-1101-1038-E1", "BIV-1101-1038-E1", "BIV-1101-1038-E1",
"BIV-1101-1038-E1", "BIV-1101-1038-E1", "BIV-1701-1145-E1", "BIV-1701-1145-E1",
"BIV-1701-1145-E1", "BIV-1701-1145-E1", "BIV-1701-1145-E1", "BIV-1102-2040-E1",
"BIV-1102-2040-E1", "BIV-1102-2040-E1", "BIV-1102-2040-E1", "BIV-1102-2040-E1"
), DATE = structure(c(NA, 17478, 17480, 17479, NA, 18295, NA,
18296, 18296, NA, NA, 17912, 17914, 17934, NA, 17221, 17221,
17223, 17224, NA, NA, 17820, 17822, 17823, NA, NA, 18359, 18361,
18361, NA), class = "Date"), haemoglobin = structure(c(NA, 101,
NA, NA, NA, 100, NA, NA, NA, NA, NA, 97.6, NA, NA, NA, NA, 109,
NA, NA, NA, NA, 120, NA, NA, NA, NA, 205, NA, NA, NA), label = "g/L", class = c("labelled",
"numeric")), WBC = structure(c(NA, NA, "5", NA, NA, NA, "27.6",
NA, NA, NA, NA, NA, "8.8", NA, NA, NA, NA, "10.3", NA, NA, NA,
NA, "23.5", NA, NA, NA, NA, "11.81", NA, NA), label = "10^9/L", class = c("labelled",
"character")), CRP = c(NA, NA, "9", NA, NA, NA, "499", NA, NA,
NA, NA, NA, "7", NA, NA, NA, "43", "54.4", NA, NA, NA, NA, "37",
NA, NA, NA, NA, "<4.0", NA, NA), admission_day = c(NA, 0L, 2L,
1L, NA, 1L, NA, 2L, 2L, NA, NA, 1L, 3L, 23L, NA, 0L, 0L, 2L,
3L, NA, NA, 0L, 2L, 3L, NA, NA, 0L, 2L, 2L, NA)), row.names = c(NA,
-30L), groups = structure(list(record_id = c("BIV-1101-1038-E1",
"BIV-1102-2040-E1", "BIV-1402-1368-E1", "BIV-1601-1250-E1", "BIV-1601-1719-E1",
"BIV-1701-1145-E1"), .rows = structure(list(16:20, 26:30, 11:15,
1:5, 6:10, 21:25), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 6L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
I only want to drop the lines when the following columns DATE
, haemoglobin
, CRP
, WBC
, and admission_day
all equal NA.我只想在以下列DATE
、 haemoglobin
、 CRP
、 WBC
和admission_day
都等于 NA 时删除这些行。 My thoughts were something like this:我的想法是这样的:
library(dplyr)
cols_to_drop <- c("DATE", "haemoglobin", "CRP", "WBC", "admission_day")
df <- df %>% mutate(case_when(is.na(cols_to_drop) ~ drop_na(DATE)))
Obviously (as usual for me) this doesn't work... II think it's something to do with needing to make case_when
equal to a particular variable... but I want it to apply across the whole dataframe.显然(像往常一样对我来说)这不起作用......我认为这与需要使case_when
等于特定变量有关......但我希望它适用于整个 dataframe。
If someone can help, I'd be grateful!如果有人可以提供帮助,我将不胜感激!
You can use if_all
/ if_any
-您可以使用if_all
/ if_any
-
library(dplyr)
cols_to_drop <- c("DATE", "haemoglobin", "CRP", "WBC", "admission_day")
df %>% filter(!if_all(cols_to_drop, is.na))
With if_any
-使用if_any
-
df %>% filter(if_any(cols_to_drop, Negate(is.na)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.