简体   繁体   中英

Using grep with two arguments in R

I've got a rather simple question, if I have variables which are called "T_01_X_1", "T_02_X_1", "T_03_X_1" and variables "T_01_Y_1", "T_02_Y_1", "T_03_Y_1" and I just want to extract the variables that include X and start with T with the grep function. How can I do that?

df <- read.table(header=TRUE, text="
T_01_X_1 T_02_X_2 T_03_X_3 T_01_Y_1 T_02_Y_2 T_03_Y_3
1 2 3 2 1 3 
2 3 4 2 1 3
2 3 4 2 1 4 
2 4 5 2 1 3 
")

items <- df[grep("T.*", names(df))] 

best!

We can use select

library(dplyr)
df %>%
    select(matches('^T_\\d+_X'))
  T_01_X_1 T_02_X_2 T_03_X_3
1        1        2        3
2        2        3        4
3        2        3        4
4        2        4        5

You can use -

df[grep("^T.*X", names(df))]

#  T_01_X_1 T_02_X_2 T_03_X_3
#1        1        2        3
#2        2        3        4
#3        2        3        4
#4        2        4        5

This will select columns that start with 'T' followed by 'X' anywhere in the name.

Just as another option, if you don't want to bother with a regex, dplyr select helpers can be useful.

library(dplyr)

df %>% 
  select(starts_with("T") & contains("X"))

#  T_01_X_1 T_02_X_2 T_03_X_3
#1        1        2        3
#2        2        3        4
#3        2        3        4
#4        2        4        5

You can also do something similar with stringr .

library(stringr)

df[str_starts(names(df), "T") & str_detect(names(df), "X")]

I used the intersect function for this:

df[intersect(grep("T.*", names(df)),grep("X", names(df)))]

  T_01_X_1 T_02_X_2 T_03_X_3
1        1        2        3
2        2        3        4
3        2        3        4
4        2        4        5

In regular expressions, you can scan for letters/words at the beginning of your search list by using the caret symbol ( ^ ). To match a specific character anywhere in the word you might want to surround that item with a match-anything regex ( .* ), which accepts any letter ( . ) any number of times ( * ).

This gives you the following regex to match what you are looking for: ^T.*X.* .

df[grep("^T.*X.*", names(df))] 
#>   T_01_X_1 T_02_X_2 T_03_X_3
#> 1        1        2        3
#> 2        2        3        4
#> 3        2        3        4
#> 4        2        4        5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM