I've got a rather simple question, if I have variables which are called "T_01_X_1", "T_02_X_1", "T_03_X_1" and variables "T_01_Y_1", "T_02_Y_1", "T_03_Y_1" and I just want to extract the variables that include X and start with T with the grep function. How can I do that?
df <- read.table(header=TRUE, text="
T_01_X_1 T_02_X_2 T_03_X_3 T_01_Y_1 T_02_Y_2 T_03_Y_3
1 2 3 2 1 3
2 3 4 2 1 3
2 3 4 2 1 4
2 4 5 2 1 3
")
items <- df[grep("T.*", names(df))]
best!
We can use select
library(dplyr)
df %>%
select(matches('^T_\\d+_X'))
T_01_X_1 T_02_X_2 T_03_X_3
1 1 2 3
2 2 3 4
3 2 3 4
4 2 4 5
You can use -
df[grep("^T.*X", names(df))]
# T_01_X_1 T_02_X_2 T_03_X_3
#1 1 2 3
#2 2 3 4
#3 2 3 4
#4 2 4 5
This will select columns that start with 'T'
followed by 'X'
anywhere in the name.
Just as another option, if you don't want to bother with a regex, dplyr
select helpers can be useful.
library(dplyr)
df %>%
select(starts_with("T") & contains("X"))
# T_01_X_1 T_02_X_2 T_03_X_3
#1 1 2 3
#2 2 3 4
#3 2 3 4
#4 2 4 5
You can also do something similar with stringr
.
library(stringr)
df[str_starts(names(df), "T") & str_detect(names(df), "X")]
I used the intersect
function for this:
df[intersect(grep("T.*", names(df)),grep("X", names(df)))]
T_01_X_1 T_02_X_2 T_03_X_3
1 1 2 3
2 2 3 4
3 2 3 4
4 2 4 5
In regular expressions, you can scan for letters/words at the beginning of your search list by using the caret symbol ( ^
). To match a specific character anywhere in the word you might want to surround that item with a match-anything regex ( .*
), which accepts any letter ( .
) any number of times ( *
).
This gives you the following regex to match what you are looking for: ^T.*X.*
.
df[grep("^T.*X.*", names(df))]
#> T_01_X_1 T_02_X_2 T_03_X_3
#> 1 1 2 3
#> 2 2 3 4
#> 3 2 3 4
#> 4 2 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.