简体   繁体   中英

Positive lookahead + overlapping matches regex

I'm looking for a regex to match all % that are not followed by a valid 2-characters hex code (2 characters in a-fA-F0-9). I came up with (%)(?=([0-9a-fA-F][^0-9a-fA-F]|[^0-9a-fA-F])) which works well but is not supported in golang , because of the positive lookahead ( ?= ).

How can I translate it (or maybe make it simpler?), so that it works with go?

For example, given the string %d%2524e%25f%255E00%%%252611%25 , it should match the first % and the first two ones of the %%% substring.

ie: https://regex101.com/r/y0YQ1I/2

I only tried this on regex101 (marked golang regex), but it seems that it works as expected:

%[0-9a-fA-F][0-9a-fA-F]|(%)

or simpler:

%[0-9a-fA-F]{2}|(%)

The real challenge here is that the matches at position 19 and 20 are overlapping , which means we can't use any of the go builtin "FindAll..." functions since they only find non-overlapping matches. This means that we've got to match the regex repeatedly against substrings starting after subsequent match indices if we want to find them all.

For the regex itself I've used a non-capturing group (?:...) instead of a lookahead assertion. Additionally, the regex will also match percent-signs at the end of the string, since they cannot be followed by two hex digits:

func findPlainPercentIndices(s string) []int {
    re := regexp.MustCompile(`%(?:[[:xdigit:]][[:^xdigit:]]|[[:^xdigit:]]|$)`)
    indices := []int{}
    idx := 0

    for {
        m := re.FindStringIndex(s[idx:])
        if m == nil {
            break
        }
        nextidx := idx + m[0]
        indices = append(indices, nextidx)
        idx = nextidx + 1
    }

    return indices
}

func main() {
    str := "%d%2524e%25f%255E00%%%252611%25%%"
    //      012345678901234567890123456789012
    //      0         1         2         3
    fmt.Printf("OK: %#v\n", findPlainPercentIndices(str))
    // OK: []int{0, 19, 20, 31, 32}
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM