简体   繁体   中英

regex select multiple groups

I have the following string from which I want to extract the content between the second pair of colons (in bold in the example):

"20160607181026_0000005:0607181026000000501: ES5206956802492 :479"

I am using R and specifically the stringr package to manipulate strings. The command I attempted to use is:

str_extract("20160607181026_0000005:0607181026000000501:ES5206956802492:479", ":(.*):")

where the regex pattern is expressed at the end of the command. This produces the following result:

":0607181026000000501:ES5206956802492:"

I know that there is a way of grouping results and back-reference them, which would allow me to select only the part I am interested in, but I don't seem to be able to figure out the right syntax.

How can I achieve this?

Also word from stringr ,

library(stringr)
word(v1, 3, sep=':')
#[1] "ES5206956802492"

If the first character after the : starts with LETTERS , then we can use a compact regex. Here, we use regex lookaround ( (?<=:) ) and match a LETTERS ( [AZ] ) that follows the : followed by one of more characters that are not a : ( [^:]+ ).

str_extract(v1, "(?<=:)[A-Z][^:]+")
#[1] "ES5206956802492"

or if it is based on the position ie 2nd position, a base R option would be to match zero or more non : ( [^:]* ) followed by the first : followed by zero or more non : followed by the second : and then we capture the non : in a group ( (...) ) and followed by rest of the characters ( .* ). In the replacement, we use the backreference, ie \\\\1 (first capture group).

sub("[^:]*:[^:]*:([^:]+).*", "\\1", v1)
#[1] "ES5206956802492"

Or the repeating part can be captured to make it compact

sub("([^:]*:){2}([^:]+).*", "\\2", v1)
#[1] "ES5206956802492"

Or with strsplit , we split at delimiter : and extract the 3rd element.

strsplit(v1, ":")[[1]][3]
#[1] "ES5206956802492"

data

v1 <- "20160607181026_0000005:0607181026000000501:ES5206956802492:479"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM