[英]r dplyr ends_with multiple string matches
Can I use dplyr::select(ends_with) to select column names that fit any of multiple conditions.我可以使用 dplyr::select(ends_with) 来选择符合多个条件中的任何一个的列名。 Considering my column names, I want to use ends with instead of contains or matches, because the strings I want to select are relevant at the end of the column name, but may also appear in the middle in others.
考虑到我的列名,我想使用结尾而不是包含或匹配,因为我要选择的字符串在列名的末尾是相关的,但也可能出现在其他列名的中间。 For instance,
例如,
df <- data.frame(a10 = 1:4,
a11 = 5:8,
a20 = 1:4,
a12 = 5:8)
I want to select columns that end with 1 or 2, to have only columns a11 and a12.我想选择以 1 或 2 结尾的列,只包含 a11 和 a12 列。 Is select(ends_with) the best way to do this?
select(ends_with) 是最好的方法吗?
Thanks!谢谢!
You can also do this using regular expressions.您也可以使用正则表达式执行此操作。 I know you did not want to use matches initially, but it actually works quite well if you use the "end of string" symbol
$
.我知道您最初不想使用匹配项,但如果您使用“字符串结尾”符号
$
,它实际上效果很好。 Separate your various endings with |
用
|
分隔你的各种结局. .
df <- data.frame(a10 = 1:4,
a11 = 5:8,
a20 = 1:4,
a12 = 5:8)
df %>% select(matches('1$|2$'))
a11 a12
1 5 5
2 6 6
3 7 7
4 8 8
If you have a more complex example with a long list, use paste0
with collapse = '|'
如果你有一个更复杂的例子,列表很长,请使用带有
collapse = '|'
paste0
. .
dff <- data.frame(a11 = 1:3,
a12 = 2:4,
a13 = 3:5,
a16 = 5:7,
my_cat = LETTERS[1:3],
my_dog = LETTERS[5:7],
my_snake = LETTERS[9:11])
my_cols <- paste0(c(1,2,6,'dog','cat'),
'$',
collapse = '|')
dff %>% select(matches(my_cols))
a11 a12 a16 my_cat my_dog
1 1 2 5 A E
2 2 3 6 B F
3 3 4 7 C G
From version 1.0.0 , you can combine multiple selections using Boolean logic such as !
从版本1.0.0 开始,您可以使用布尔逻辑组合多个选择,例如
!
(negate), &
(and) and |
(否定),
&
(and) and |
(or). (或)。
### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)
df <- data.frame(a10 = 1:4,
a11 = 5:8,
a20 = 1:4,
a12 = 5:8)
df %>%
select(ends_with("1") | ends_with("2"))
#> a11 a12
#> 1 5 5
#> 2 6 6
#> 3 7 7
#> 4 8 8
or use num_range()
to select the desired columns或使用
num_range()
选择所需的列
df %>%
select(num_range(prefix = "a", range = 11:12))
#> a11 a12
#> 1 5 5
#> 2 6 6
#> 3 7 7
#> 4 8 8
Created on 2020-02-17 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 2 月 17 日创建
I don't know if ends_with()
is the best way to do this, but you could also do this in base R with a logical index.我不知道
ends_with()
是否是执行此操作的最佳方法,但您也可以在带有逻辑索引的基R 中执行此操作。
# Extract the last character of the column names, and test if it is "1" or "2"
lgl_index <- substr(x = names(df),
start = nchar(names(df)),
stop = nchar(names(df))) %in% c("1", "2")
With this index, you can subset the dataframe as follows使用此索引,您可以按如下方式对数据帧进行子集
df[, lgl_index]
a11 a12
1 5 5
2 6 6
3 7 7
4 8 8
or with dplyr::select()
或使用
dplyr::select()
select(df, which(lgl_index))
a11 a12
1 5 5
2 6 6
3 7 7
4 8 8
keeping only columns that end with either 1 or 2.只保留以 1 或 2 结尾的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.