[英]R - subset rows based on column names (in a vector) and specific values in those columns
This is how my df
looks like:这就是我的df
的样子:
df <- data.frame(WoS = c(1L, NA, 1L, NA, 1L, NA), Scopus = c(1L, 1L, 1L, 1L, NA, NA), Dim = c(NA, NA, 1L, 1L, 1L, 1L), Lens = c(NA, NA, NA, 1L, NA, 1L))
or:或者:
| WoS| Scopus| Dim| Lens| # (+ various other columns)
|---:|------:|---:|----:|
| 1| 1| NA| NA|
| NA| 1| NA| NA|
| 1| 1| 1| NA|
| NA| 1| 1| 1|
| 1| NA| 1| NA|
| NA| NA| 1| 1|
# (+ hundreds of other rows in which 1 and NAs are distributed among these four columns)
I want to subset df
based on a vector in which column names are stored;我想根据存储列名的向量对df
进行子集化; the values of at least one of these columns should equal 1
.这些列中至少一列的值应等于1
。
The other columns not mentioned in vec
should be NA
. vec
中未提及的其他列应为NA
。
Example:例子:
Say that I have a vector vec <- c("WoS", "Scopus")
.假设我有一个向量vec <- c("WoS", "Scopus")
。
Then I want to select all rows where df$WoS = 1
OR df$Scopus = 1
, and where is.na(df$Dim)
and is.na(df$Lens)
:然后我想 select df$WoS = 1
OR df$Scopus = 1
的所有行,其中is.na(df$Dim)
和is.na(df$Lens)
:
| WoS| Scopus| Dim| Lens| # (+ keep all other columns ...)
|---:|------:|---:|----:|
| 1| 1| NA| NA|
| NA| 1| NA| NA|
| 1| NA| NA| NA|
| NA| 1| NA| NA|
| 1| 1| NA| NA|
How to do it in the best way?如何以最好的方式做到这一点?
We can store the column names into vectors, and then apply filter
for different conditions.我们可以将列名存储到向量中,然后针对不同的条件应用filter
。
library(dplyr)
target1 <- c("WoS", "Scopus")
target2 <- c("Dim", "Lens")
df2 <- df %>%
filter(rowSums(select(., all_of(target1)), na.rm = TRUE) <= 2) %>%
filter(across(all_of(target2), .fns = is.na))
df2
# WoS Scopus Dim Lens
# 1 1 1 NA NA
# 2 NA 1 NA NA
If you don't like to use rowSums
as the values in some columns may not be strictly one, we can change to the following, using filter
and if_any
.如果您不喜欢使用rowSums
,因为某些列中的值可能不是严格意义上的一,我们可以使用filter
和if_any
更改为以下内容。
df2 <- df %>%
filter(if_any(all_of(target1), .fns = function(x) x == 1)) %>%
filter(across(all_of(target2), .fns = is.na))
df2
# WoS Scopus Dim Lens
# 1 1 1 NA NA
# 2 NA 1 NA NA
We can also change the across
in the second filter
function to if_all
.我们还可以将第二个filter
across
中的 cross 更改为if_all
。
df2 <- df %>%
filter(if_any(all_of(target1), .fns = function(x) x == 1)) %>%
filter(if_all(all_of(target2), .fns = is.na))
df2
# WoS Scopus Dim Lens
# 1 1 1 NA NA
# 2 NA 1 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.