Keep text between 2nd dash and first flash in R

Question

I have a vector of strings that look like this:

a - bc/def_g  - A/mn/us/ww
opq - rs/ts_uf - BC/wx/yza
Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE

I'd like to get the text after 2nd dash (-) but before first flash (/), ie the result should look like

A
BC
XYZ

What is the best way to do it (the vector has more than 500K rows.)

Thanks

Answer 1

Suppose your string is defined like this:

string <- c("a - bc/def_g  - A/mn/us/ww", 
            "opq - rs/ts_uf - BC/wx/yza", 
            "Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE")

Then you can use sub

> sub(".*\\-\\s+([A-Z]+)/.*", "\\1", string)
[1] "A"   "BC"  "XYZ"

Answer 2

See regex in use here

^[^-]*-[^-]*-\s*\K[^/]+

^ Assert position at the start of the line
[^-]* Match any character except - any number of times
- Match this literally
[^-]* Match any character except - any number of times
- Match this literally
\\s* Match any number of whitespace characters
\\K Resets the starting point of the pattern. Any previously consumed characters are no longer included in the final match
[^/]+ Match any character except / one or more times

Alternatively, as suggested by Jan in the comments below (I believe it has since been deleted) ^(?:\\[^-\\]*-){2}\\s*\\K\\[^/\\]+ may be used. It's shorter and easily scalable, but more adds steps.

See code in use here

x <- c("a - bc/def_g  - A/mn/us/ww", "opq - rs/ts_uf - BC/wx/yza", "Abc - so/dhie7u - XYZ/En/xy/jkq - QWNE")
m <- regexpr("^[^-]*-[^-]*-\\s*\\K[^/]+", x, perl=T)
regmatches(x, m)

Result: [1] "A" "BC" "XYZ"

Keep text between 2nd dash and first flash in R

Question

2 answers

solution1
1 2018-03-22 14:50:17

solution2
1 2018-03-22 14:51:26

Keep text between 2nd dash and first flash in R

Question

2 answers

solution1 1 2018-03-22 14:50:17

solution2 1 2018-03-22 14:51:26

solution1
1 2018-03-22 14:50:17

solution2
1 2018-03-22 14:51:26