简体   繁体   English

在R中使用dplyr :: contains()和dplyr :: select()的正负子集

[英]Positive and negative subsetting using dplyr::contains() and dplyr::select() in R

I'm trying to achieve positive subsetting specifically using a combination of dplyr::select() and dplyr::contains()`, with the goal being to subset by multiple string matches. 我正在尝试使用dplyr::select()和dplyr :: contains()的组合来实现积极的子集,目标是通过多个字符串匹配来进行子集化。

Minimal working example: when starting off with df1 and doing negative subsetting, I generate df2 as expected. 最小的工作示例:从df1开始并进行负子集设置时,我按预期生成df2 In contrast, when attempting positive subsetting of df1 , I generate df3 (no columns) when I'd have expected something like df4 . 相反,当尝试对df1进行正子集设置时,当我期望像df4这样的东西时,会生成df3 (无列)。 Thanks for any help. 谢谢你的帮助。

df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_")) 
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))

Think (and have a look to the resulting data.frame ) to what happens after: df1 %>% select(contains("ppt_")) . 思考(并查看结果data.frame )在以下情况后会发生什么: df1 %>% select(contains("ppt_")) As asked, it only retains the only column that contains "ppt_" . 如所要求的,它仅保留包含"ppt_"唯一列。 Further expressions cannot work as you expect since other columns, no matter what you're feeding select with, are "no longer" there. 进一步的表达式无法按您期望的那样工作,因为其他列(无论您要select内容如何)都不再存在。

You can keep the same idea but combine in the same select you three keys: 你可以保持相同的想法,但在同一组合 select你三把钥匙:

df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

Alternatively, you can do it with matches , that accepts regular expressions: 另外,您可以使用matches接受正则表达式:

df1 %>% select(matches(c("ppt_|het_|orm_")))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

And by the way you can also use it to shorten your "negative" indexing: 顺便说一下,您也可以使用它来缩短“负”索引:

df1 %>% select(-matches("ppt_|het_|orm_"))
  hours distance
1     4       23
2     6       65
3     4       21

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM