How to subset a string in R

Question

Dear all I have a vector of strings like:

LOCAT01PE
WECAT013EJD
AFECAT0155DR

I want to subset each value obtain only CAT and all the number after:

CAT01
CAT013
CAT0155

I have tried to use the command substr but it won't work since the quantity before the word CAT is not fixed and the numbers after CAT are not fixed.

Answer 1

In base R, we can use sub to extract "CAT" followed by numbers.

x <- c('LOCAT01PE', 'WECAT013EJD', 'AFECAT0155DR')
sub('..*(CAT\\d+).*', '\\1', x)
#[1] "CAT01"   "CAT013"  "CAT0155"

Or similar with str_extract

stringr::str_extract(x, "CAT\\d+")

Answer 2

We can also use substr with regexpr to identify relevant start/stop points in the string:

substr(vec,
       start = regexpr('CAT', vec),
       stop = regexpr('\\d[a-zA-Z]', vec)
       )

Output:

[1] "CAT01"   "CAT013"  "CAT0155"

Answer 3

We can use regexpr/regmatches in base R . It matches the word 'CAT' followed by - if there is any ? and one or more digits ( \\\\d+ )

regmatches(x, regexpr("CAT-?\\d+", x))
#[1] "CAT01"    "CAT013"   "CAT0155"  "CAT-01"   "CAT-013"  "CAT-0155"

data

x <- c('LOCAT01PE', 'WECAT013EJD', 'AFECAT0155DR', 
    'LO-CAT-01PE', 'WE-CAT-013-EJD', 'AFE-CAT-0155-DR')

How to subset a string in R

Question

3 answers

solution1
2 2020-03-07 11:32:31

solution2
1 2020-03-07 11:46:30

solution3
1 ACCPTED 2020-03-07 18:28:06

data

How to subset a string in R

Question

3 answers

solution1 2 2020-03-07 11:32:31

solution2 1 2020-03-07 11:46:30

solution3 1 ACCPTED 2020-03-07 18:28:06

data

solution1
2 2020-03-07 11:32:31

solution2
1 2020-03-07 11:46:30

solution3
1 ACCPTED 2020-03-07 18:28:06