I was looking for an answer but I failed. My question is a bit peculiar and I'm still learing regex. I'd like to achieve, from this:
str1 <- 'txt1/txt2/123|abc_def_123|1991-02-11'
something like this:
txt1|abc_def_123|1991-02-11
so everything from /
(included) to first occurence of |
should be removed. So far I wrote this one:
sub("\\/.*\\|", "|", str1 )
but it removes everything to the last occurence of |
"txt1|1991-02-11"
How can I point out that the substring should be removed to he first occurence of |
?
You can use /[^|]*
which takes the first /
and everything after this as long as it is not a |
.
sub("/[^|]*", "", str1)
#[1] "txt1|abc_def_123|1991-02-11"
Following your attempt, you can make your regex lazy
sub("/.*?\\|", "|", str1 )
#[1] "txt1|abc_def_123|1991-02-11"
You could use a negated character class with a positive lookahead
/[^|]*(?=\|)
To not cross newlines, you could extend the negated character class:
/[^|\r\n]*(?=\|)
str1 <- 'txt1/txt2/123|abc_def_123|1991-02-11'
sub("/[^|]*(?=\\|)", "", str1, perl=TRUE)
Output
[1] "txt1|abc_def_123|1991-02-11"
Another approach is by using backreference:
sub("(^.*)/.*/.*?(\\|.*$)", "\\1\\2", str1)
[1] "txt1|abc_def_123|1991-02-11"
Here, the double backreference \\1\\2
'recalls' the strings in the two capturing groups (...)
, while the parts not included in capturing groups get removed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.