How can I select two characters in a string?

Question

I know that maybe is something of very easy to resolve but, looking for various example online, I did not find the right example to resolve my problem.

I have in a data.frame under a column a following phrase:

ID
p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI
p_HHDU;o_WWj;l_WWOJ;g_jjjDI

I would like to select two words: the one who start with p_ and the one who start with g_ and eliminate all the rest which is between them.... do you have any suggestion about how make it? I'm trying with gsub but with no success at the moment. Thank you a lot in advance

Answer 1

An approach with strrsplit ,

sapply(strsplit(x, ';'), function(i) paste(grep('p_|g_', i, value = TRUE), collapse = ';'))
#[1] "p_IIJSJ;g_jjjdI"

or if the order is always the same (as @Jaap mentions)

sapply(strsplit(df$ID,';'), function(x) paste(x[c(1,4)], collapse=';'))

Answer 2

I suggest you use package stringr which makes it easy:

library(stringr)

a <- "p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI"
b <- "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"

str_extract(string = a, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))

# [1] "p_IIJSJ" "g_jjjdI"

str_extract(string = b, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))

# [1] "p_HHDU"  "g_jjjDI"

Answer 3

We can use sub

sub(";*(p_\\w+).*;*(g_\\w+).*", "\\1;\\2", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI"

Or with gsub

gsub("[^pg]_\\w+;", "", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI"

data

df1 <- structure(list(ID = c("p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI", "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"
)), .Names = "ID", class = "data.frame", row.names = c(NA, -2L))

How can I select two characters in a string?

Question

3 answers

solution1
2 2017-05-26 09:52:01

solution2
1 2017-05-26 10:31:57

solution3
0 2017-05-26 10:29:09

data

How can I select two characters in a string?

Question

3 answers

solution1 2 2017-05-26 09:52:01

solution2 1 2017-05-26 10:31:57

solution3 0 2017-05-26 10:29:09

data

solution1
2 2017-05-26 09:52:01

solution2
1 2017-05-26 10:31:57

solution3
0 2017-05-26 10:29:09