replacing all characters between two patterns in r

Question

I have a data frame with the following column:

  Col_A
tr_1 A1; gn_1 TG1;
tr_2 A2; gn_2 TG2;
tr_3 A3; gn_3 TG3;
tr_4 A4; gn_4 TG4;
tr_5 A5; gn_5 TG5;

I would like to use gsub command and regular expression to remove all the characters from the begining to the end of string "gn_1" in all rows of data frame. Or replace all those characters with "".

What I would like to have at the end look like this:

 Col_A
 TG1
 TG2
 TG3
 TG4
 TG5

Do you have any idea how can I do this in r.

Answer 1

The following regex will do what you want.

sub("^.*gn_\\d+\\s([[:alnum:]]+).*$", "\\1", df1$Col_A)
#[1] "TG1" "TG2" "TG3" "TG4" "TG5"

Data in dput format.

df1 <-
structure(list(Col_A = structure(1:5, 
.Label = c("tr_1 A1; gn_1 TG1;", "tr_2 A2; gn_2 TG2;", 
"tr_3 A3; gn_3 TG3;", "tr_4 A4; gn_4 TG4;", 
"tr_5 A5; gn_5 TG5;"), class = "factor")), 
class = "data.frame", row.names = c(NA, -5L))

Answer 2

You could always use the stringi package:

library(stringi)
stri_extract_last_words(df1$Col_A)
[1] "TG1" "TG2" "TG3" "TG4" "TG5"

EDIT: just re-read your question, (this assumes there is always one word after gn_# , use it with caution)

Answer 3

I got what I want with the following command. I am posting it here if anyone was looking for the answer.

DF$col <- gsub("^tr.*gn_. ", "", DF$col)

DF$col <- gsub(";", "", DF$col)

replacing all characters between two patterns in r

Question

3 answers

solution1
3 2019-03-13 19:02:57

solution2
1 2019-03-13 19:13:52

solution3
0 2019-03-13 19:21:19

replacing all characters between two patterns in r

Question

3 answers

solution1 3 2019-03-13 19:02:57

solution2 1 2019-03-13 19:13:52

solution3 0 2019-03-13 19:21:19

solution1
3 2019-03-13 19:02:57

solution2
1 2019-03-13 19:13:52

solution3
0 2019-03-13 19:21:19