[英]R: Return rows with only 1 non-NA value for a set of columns
Suppose I have a data.table with the following data:假设我有一个包含以下数据的 data.table:
colA colB colC result
1 2 3 231
1 NA 2 123
NA 3 NA 345
11 NA NA 754
How would I use dplyr
and magrittr
to only select the following rows:我将如何使用dplyr
和magrittr
只选择以下行:
colA colB colC result
NA 3 NA 345
11 NA NA 754
The selection criteria is: only 1 non-NA value for columns AC (ie colA, colB, ColC
)选择标准是:AC 列只有 1 个非 NA 值(即colA, colB, ColC
)
I have been unable to find a similar question;我一直找不到类似的问题; guessing this is an odd situation.猜测这是一个奇怪的情况。
A base R option would be一个基本的 R 选项是
df[apply(df, 1, function(x) sum(!is.na(x)) == 1), ]
# colA colB colC
#3 NA 3 NA
#4 11 NA NA
A dplyr
option is dplyr
选项是
df %>% filter(rowSums(!is.na(.)) == 1)
In response to your comment, you can do为了回应你的评论,你可以做
df[apply(df[, -ncol(df)], 1, function(x) sum(!is.na(x)) == 1), ]
# colA colB colC result
#3 NA 3 NA 345
#4 11 NA NA 754
Or the same in dplyr
或在dplyr
相同
df %>% filter(rowSums(!is.na(.[-length(.)])) == 1)
This assumes that the last column is the one you'd like to ignore.这假设最后一列是您要忽略的列。
df <-read.table(text = "colA colB colC
1 2 3
1 NA 2
NA 3 NA
11 NA NA", header = T)
df <- read.table(text =
"colA colB colC result
1 2 3 231
1 NA 2 123
NA 3 NA 345
11 NA NA 754
", header = T)
Another option is filter
with map
另一种选择是用map
filter
library(dplyr)
library(purrr)
df %>%
filter(map(select(., starts_with('col')), ~ !is.na(.)) %>%
reduce(`+`) == 1)
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
Or another option is to use transmute_at
或者另一种选择是使用transmute_at
df %>%
transmute_at(vars(starts_with('col')), ~ !is.na(.)) %>%
reduce(`+`) %>%
magrittr::equals(1) %>% filter(df, .)
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
df <- structure(list(colA = c(1L, 1L, NA, 11L), colB = c(2L, NA, 3L,
NA), colC = c(3L, 2L, NA, NA), result = c(231L, 123L, 345L, 754L
)), class = "data.frame", row.names = c(NA, -4L))
I think this would be possible with filter_at
but I was not able to make it work.我认为这可以通过filter_at
实现,但我无法使其工作。 Here is one attempt with filter
and pmap_lgl
where you can specify the range of columns in select
or specify by their positions or use other tidyselect helper variables.这是使用filter
和pmap_lgl
一种尝试,您可以在其中指定select
的列范围或通过它们的位置指定或使用其他 tidyselect 辅助变量。
library(dplyr)
library(purrr)
df %>%
filter(pmap_lgl(select(., colA:colC), ~sum(!is.na(c(...))) == 1))
# colA colB colC result
#1 NA 3 NA 345
#2 11 NA NA 754
data数据
df <- structure(list(colA = c(1L, 1L, NA, 11L), colB = c(2L, NA, 3L,
NA), colC = c(3L, 2L, NA, NA), result = c(231L, 123L, 345L, 754L
)), class = "data.frame", row.names = c(NA, -4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.