[英]Positive and negative subsetting using dplyr::contains() and dplyr::select() in R
I'm trying to achieve positive subsetting specifically using a combination of dplyr::select()
and dplyr::contains()`, with the goal being to subset by multiple string matches. 我正在尝试使用
dplyr::select()
和dplyr :: contains()的组合来实现积极的子集,目标是通过多个字符串匹配来进行子集化。
Minimal working example: when starting off with df1
and doing negative subsetting, I generate df2
as expected. 最小的工作示例:从
df1
开始并进行负子集设置时,我按预期生成df2
。 In contrast, when attempting positive subsetting of df1
, I generate df3
(no columns) when I'd have expected something like df4
. 相反,当尝试对
df1
进行正子集设置时,当我期望像df4
这样的东西时,会生成df3
(无列)。 Thanks for any help. 谢谢你的帮助。
df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_"))
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))
Think (and have a look to the resulting data.frame
) to what happens after: df1 %>% select(contains("ppt_"))
. 思考(并查看结果
data.frame
)在以下情况后会发生什么: df1 %>% select(contains("ppt_"))
。 As asked, it only retains the only column that contains "ppt_"
. 如所要求的,它仅保留包含
"ppt_"
的唯一列。 Further expressions cannot work as you expect since other columns, no matter what you're feeding select
with, are "no longer" there. 进一步的表达式无法按您期望的那样工作,因为其他列(无论您要
select
内容如何)都不再存在。
You can keep the same idea but combine in the same select
you three keys: 你可以保持相同的想法,但在同一组合
select
你三把钥匙:
df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
ppt_paint het_heating orm_wood
1 45 1 QQ
2 98 1 OA
3 23 2 BB
Alternatively, you can do it with matches
, that accepts regular expressions: 另外,您可以使用
matches
接受正则表达式:
df1 %>% select(matches(c("ppt_|het_|orm_")))
ppt_paint het_heating orm_wood
1 45 1 QQ
2 98 1 OA
3 23 2 BB
And by the way you can also use it to shorten your "negative" indexing: 顺便说一下,您也可以使用它来缩短“负”索引:
df1 %>% select(-matches("ppt_|het_|orm_"))
hours distance
1 4 23
2 6 65
3 4 21
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.