简体   繁体   English

R - 基于列名(在向量中)和这些列中的特定值的子集行

[英]R - subset rows based on column names (in a vector) and specific values in those columns

This is how my df looks like:这就是我的df的样子:

df <- data.frame(WoS = c(1L, NA, 1L, NA, 1L, NA), Scopus = c(1L, 1L, 1L, 1L, NA, NA), Dim = c(NA, NA, 1L, 1L, 1L, 1L), Lens = c(NA, NA, NA, 1L, NA, 1L))

or:或者:

| WoS| Scopus| Dim| Lens| # (+ various other columns)
|---:|------:|---:|----:|
|   1|      1|  NA|   NA|
|  NA|      1|  NA|   NA|
|   1|      1|   1|   NA|
|  NA|      1|   1|    1|
|   1|     NA|   1|   NA|
|  NA|     NA|   1|    1|

# (+ hundreds of other rows in which 1 and NAs are distributed among these four columns)

I want to subset df based on a vector in which column names are stored;我想根据存储列名的向量对df进行子集化; the values of at least one of these columns should equal 1 .这些列中至少一列的值应等于1

The other columns not mentioned in vec should be NA . vec提及的其他列应为NA

Example:例子:

Say that I have a vector vec <- c("WoS", "Scopus") .假设我有一个向量vec <- c("WoS", "Scopus")

Then I want to select all rows where df$WoS = 1 OR df$Scopus = 1 , and where is.na(df$Dim) and is.na(df$Lens) :然后我想 select df$WoS = 1 OR df$Scopus = 1的所有行,其中is.na(df$Dim)is.na(df$Lens)

| WoS| Scopus| Dim| Lens| # (+ keep all other columns ...)
|---:|------:|---:|----:|
|   1|      1|  NA|   NA|
|  NA|      1|  NA|   NA|
|   1|     NA|  NA|   NA|
|  NA|      1|  NA|   NA|
|   1|      1|  NA|   NA|

How to do it in the best way?如何以最好的方式做到这一点?

We can store the column names into vectors, and then apply filter for different conditions.我们可以将列名存储到向量中,然后针对不同的条件应用filter

library(dplyr)

target1 <- c("WoS", "Scopus")
target2 <- c("Dim", "Lens")

df2 <- df %>%
  filter(rowSums(select(., all_of(target1)), na.rm = TRUE) <= 2) %>%
  filter(across(all_of(target2), .fns = is.na))
df2
#   WoS Scopus Dim Lens
# 1   1      1  NA   NA
# 2  NA      1  NA   NA

If you don't like to use rowSums as the values in some columns may not be strictly one, we can change to the following, using filter and if_any .如果您不喜欢使用rowSums ,因为某些列中的值可能不是严格意义上的一,我们可以使用filterif_any更改为以下内容。

df2 <- df %>%
  filter(if_any(all_of(target1), .fns = function(x) x == 1)) %>%
  filter(across(all_of(target2), .fns = is.na))
df2
#   WoS Scopus Dim Lens
# 1   1      1  NA   NA
# 2  NA      1  NA   NA

We can also change the across in the second filter function to if_all .我们还可以将第二个filter across中的 cross 更改为if_all

df2 <- df %>%
  filter(if_any(all_of(target1), .fns = function(x) x == 1)) %>%
  filter(if_all(all_of(target2), .fns = is.na))
df2
#   WoS Scopus Dim Lens
# 1   1      1  NA   NA
# 2  NA      1  NA   NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 (R,Data.Tables):根据列中的逻辑值对行进行子集,并具有动态分配的列名 - (R, Data.Tables): Subset rows based on logical values in columns with dynamically assigned column names R // 基于名称的子集矩阵行和列 - R // subset matrix rows and columns based on names 根据列名的向量在列表中按列逐列设置子数据集并汇总列 - subset dataframe by column in a list based on a vector of column names and summarize the columns 根据未知名称的列的值和列数对行进行子集 - Subset rows based on values of columns of unknown names and number of columns 如何使用 R 根据 dataframe 中各个列中的最小值对特定列中的行进行子集化 - how to subset rows in specific columns based on minimum values in individual columns in a dataframe using R 基于列名的子集列 - subset columns based on column names 如何根据列名的长度对 R 中的列进行子集化? - How do I subset columns in R based on the length of the column names? 在R中,如何根据向量中的值对数据框的行进行子集化 - In R, how do you subset rows of a dataframe based on values in a vector 通过基于R中的向量的值选择行来新的子集 - New subset by selecting rows based on values of a vector in R 如何根据所有列中的特定值对行进行子集化 - How to subset rows based on specific values from all columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM