Using grep with two arguments in R

Question

I've got a rather simple question, if I have variables which are called "T_01_X_1", "T_02_X_1", "T_03_X_1" and variables "T_01_Y_1", "T_02_Y_1", "T_03_Y_1" and I just want to extract the variables that include X and start with T with the grep function. How can I do that?

df <- read.table(header=TRUE, text="
T_01_X_1 T_02_X_2 T_03_X_3 T_01_Y_1 T_02_Y_2 T_03_Y_3
1 2 3 2 1 3 
2 3 4 2 1 3
2 3 4 2 1 4 
2 4 5 2 1 3 
")

items <- df[grep("T.*", names(df))]

best!

Answer 1

We can use select

library(dplyr)
df %>%
    select(matches('^T_\\d+_X'))
  T_01_X_1 T_02_X_2 T_03_X_3
1        1        2        3
2        2        3        4
3        2        3        4
4        2        4        5

Answer 2

You can use -

df[grep("^T.*X", names(df))]

#  T_01_X_1 T_02_X_2 T_03_X_3
#1        1        2        3
#2        2        3        4
#3        2        3        4
#4        2        4        5

This will select columns that start with 'T' followed by 'X' anywhere in the name.

Answer 3

Just as another option, if you don't want to bother with a regex, dplyr select helpers can be useful.

library(dplyr)

df %>% 
  select(starts_with("T") & contains("X"))

#  T_01_X_1 T_02_X_2 T_03_X_3
#1        1        2        3
#2        2        3        4
#3        2        3        4
#4        2        4        5

You can also do something similar with stringr .

library(stringr)

df[str_starts(names(df), "T") & str_detect(names(df), "X")]

Answer 4

I used the intersect function for this:

df[intersect(grep("T.*", names(df)),grep("X", names(df)))]

  T_01_X_1 T_02_X_2 T_03_X_3
1        1        2        3
2        2        3        4
3        2        3        4
4        2        4        5

Answer 5

In regular expressions, you can scan for letters/words at the beginning of your search list by using the caret symbol ( ^ ). To match a specific character anywhere in the word you might want to surround that item with a match-anything regex ( .* ), which accepts any letter ( . ) any number of times ( * ).

This gives you the following regex to match what you are looking for: ^T.*X.* .

df[grep("^T.*X.*", names(df))] 
#>   T_01_X_1 T_02_X_2 T_03_X_3
#> 1        1        2        3
#> 2        2        3        4
#> 3        2        3        4
#> 4        2        4        5

Using grep with two arguments in R

Question

5 answers

solution1
3 2021-06-04 18:34:39

solution2
2 2021-06-04 09:09:09

solution3
2 2021-06-04 18:40:44

solution4
1 2021-06-04 09:11:05

solution5
1 2021-06-04 09:19:23

Using grep with two arguments in R

Question

5 answers

solution1 3 2021-06-04 18:34:39

solution2 2 2021-06-04 09:09:09

solution3 2 2021-06-04 18:40:44

solution4 1 2021-06-04 09:11:05

solution5 1 2021-06-04 09:19:23

solution1
3 2021-06-04 18:34:39

solution2
2 2021-06-04 09:09:09

solution3
2 2021-06-04 18:40:44

solution4
1 2021-06-04 09:11:05

solution5
1 2021-06-04 09:19:23