简体   繁体   中英

R str_extract everything before and after ellipsis

I'm trying to find a way to split a character column with an ellipsis in the middle into two columns, everything before the ellipsis and everything after.

For example, if I have:

a <- "60.4 (b)(33) and (e)(1) revised....................................46111"

How do I split that into "60.4 (b)(33) and (e)(1) revised" and "46111"?

I have tried:

str_extract(a, ".*\\.{2,}")

for the first part, and for the second part:

str_extract(a, "\\.{2,}.*")

but that keeps the ellipsis in both, which I'd like to drop.

It seems you want to split , not to extract , with a pattern that matches two or more consecutive dots:

a <- "60.4 (b)(33) and (e)(1) revised....................................46111"
unlist(stringr::str_split(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"                          

## Base R strsplit:
unlist(strsplit(a, "\\.{2,}"))
## => [1] "60.4 (b)(33) and (e)(1) revised" "46111"

There is another possible splitting regex here: you can match any one or more dots that are followed with a some one or more digits at the end of string:

unlist(stringr::str_split(a, "\\.+(?=\\d+$)"))
unlist(strsplit(a, "\\.+(?=\\d+$)", perl=TRUE))

Both yield the same [1] "60.4 (b)(33) and (e)(1) revised" "46111" output. Here, \\.+ matches one or more dots and (?=\\d+$) is a positive lookahead that matches a location that is immediately followed with one or more digits ( \\d+ ) and then end of string ( $ ).

Another approach is a matching one with str_match (to capture the bits you need):

res <- stringr::str_match(a, "^(.*?)\\.+(\\d+)$")
res[,-1]
# => [1] "60.4 (b)(33) and (e)(1) revised" "46111" 

Here,

  • ^ - matches the start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
  • \\.+ - one or more dots
  • (\\d+) - Group 2: one or more digits
  • $ - end of string.

The res[,-1] is necessary to remove the first column with the full matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM