I have a vector of strings that look like this:
a - bc/def_g - A/mn/us/ww
opq - rs/ts_uf - BC/wx/yza
Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE
I'd like to get the text after 2nd dash (-) but before first flash (/), ie the result should look like
A
BC
XYZ
What is the best way to do it (the vector has more than 500K rows.)
Thanks
Suppose your string is defined like this:
string <- c("a - bc/def_g - A/mn/us/ww",
"opq - rs/ts_uf - BC/wx/yza",
"Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE")
Then you can use sub
> sub(".*\\-\\s+([A-Z]+)/.*", "\\1", string)
[1] "A" "BC" "XYZ"
^[^-]*-[^-]*-\s*\K[^/]+
^
Assert position at the start of the line [^-]*
Match any character except -
any number of times -
Match this literally [^-]*
Match any character except -
any number of times -
Match this literally \\s*
Match any number of whitespace characters \\K
Resets the starting point of the pattern. Any previously consumed characters are no longer included in the final match [^/]+
Match any character except /
one or more times Alternatively, as suggested by Jan in the comments below (I believe it has since been deleted) ^(?:\\[^-\\]*-){2}\\s*\\K\\[^/\\]+
may be used. It's shorter and easily scalable, but more adds steps.
x <- c("a - bc/def_g - A/mn/us/ww", "opq - rs/ts_uf - BC/wx/yza", "Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE")
m <- regexpr("^[^-]*-[^-]*-\\s*\\K[^/]+", x, perl=T)
regmatches(x, m)
Result: [1] "A" "BC" "XYZ"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.