简体   繁体   中英

Remove substring from string between special characters in R

I was looking for an answer but I failed. My question is a bit peculiar and I'm still learing regex. I'd like to achieve, from this:

str1 <- 'txt1/txt2/123|abc_def_123|1991-02-11'

something like this:

txt1|abc_def_123|1991-02-11

so everything from / (included) to first occurence of | should be removed. So far I wrote this one:

sub("\\/.*\\|", "|", str1 )

but it removes everything to the last occurence of |

"txt1|1991-02-11"

How can I point out that the substring should be removed to he first occurence of | ?

You can use /[^|]* which takes the first / and everything after this as long as it is not a | .

sub("/[^|]*", "", str1)
#[1] "txt1|abc_def_123|1991-02-11"

Following your attempt, you can make your regex lazy

sub("/.*?\\|", "|", str1 )
#[1] "txt1|abc_def_123|1991-02-11"

You could use a negated character class with a positive lookahead

/[^|]*(?=\|)

To not cross newlines, you could extend the negated character class:

/[^|\r\n]*(?=\|)

regex demo

str1 <- 'txt1/txt2/123|abc_def_123|1991-02-11'
sub("/[^|]*(?=\\|)", "", str1, perl=TRUE)

Output

[1] "txt1|abc_def_123|1991-02-11"

Another approach is by using backreference:

sub("(^.*)/.*/.*?(\\|.*$)", "\\1\\2", str1)
[1] "txt1|abc_def_123|1991-02-11"

Here, the double backreference \\1\\2 'recalls' the strings in the two capturing groups (...) , while the parts not included in capturing groups get removed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM