Dear all I have a vector of strings like:
LOCAT01PE
WECAT013EJD
AFECAT0155DR
I want to subset each value obtain only CAT and all the number after:
CAT01
CAT013
CAT0155
I have tried to use the command substr
but it won't work since the quantity before the word CAT is not fixed and the numbers after CAT are not fixed.
In base R, we can use sub
to extract "CAT" followed by numbers.
x <- c('LOCAT01PE', 'WECAT013EJD', 'AFECAT0155DR')
sub('..*(CAT\\d+).*', '\\1', x)
#[1] "CAT01" "CAT013" "CAT0155"
Or similar with str_extract
stringr::str_extract(x, "CAT\\d+")
We can also use substr
with regexpr
to identify relevant start/stop points in the string:
substr(vec,
start = regexpr('CAT', vec),
stop = regexpr('\\d[a-zA-Z]', vec)
)
Output:
[1] "CAT01" "CAT013" "CAT0155"
We can use regexpr/regmatches
in base R
. It matches the word 'CAT' followed by -
if there is any ?
and one or more digits ( \\\\d+
)
regmatches(x, regexpr("CAT-?\\d+", x))
#[1] "CAT01" "CAT013" "CAT0155" "CAT-01" "CAT-013" "CAT-0155"
x <- c('LOCAT01PE', 'WECAT013EJD', 'AFECAT0155DR',
'LO-CAT-01PE', 'WE-CAT-013-EJD', 'AFE-CAT-0155-DR')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.