简体   繁体   中英

Splitter in Golang

Below is the Java code, I need something similar in Go :

List<String> tokens = Lists.newArrayList(Splitter.on(CharMatcher.anyOf("[]//"))
.trimResults().omitEmptyStrings().split(entry.getValue()))

This is what I have tried:

re := regexp.MustCompile(`[//]`)
tokens := re.Split(entry, -1)

Using regexp is usually slower than doing it manually. Since the task is not complex, the non-regexp solution isn't complicated either.

You may use strings.FieldsFunc() to split a string on a set of characters, and strings.TrimSpace() to strip off leading and trailing white-spaces.

Here's a simple function doing what you want:

func split(s, sep string) (tokens []string) {
    fields := strings.FieldsFunc(s, func(r rune) bool {
        return strings.IndexRune(sep, r) != -1
    })
    for _, s2 := range fields {
        s2 = strings.TrimSpace(s2)
        if s2 != "" {
            tokens = append(tokens, s2)
        }
    }
    return
}

Testing it:

fmt.Printf("%q\n", split("a,b;c, de; ; fg ", ",;"))
fmt.Printf("%q\n", split("a[b]c[ de/ / fg ", "[]/"))

Output (try it on the Go Playground ):

["a" "b" "c" "de" "fg"]
["a" "b" "c" "de" "fg"]

Improvements

If performance is an issue and you have to call this split() function many times, it would be profitable to create a set-like map from the separator characters, and reuse that, so inside the function passed to strings.FieldFunc() , you can simply check if the rune is in this map, so you would not need to call strings.IndexRune() to decide if the given rune is a separator character.

The performance gain might not be significant if you have few separator characters (like 1-3 characters), but if you would have a lot more, using a map could significantly improve performance.

This is how it could look like:

var (
    sep1 = map[rune]bool{',': true, ';': true}
    sep2 = map[rune]bool{'[': true, ']': true, '/': true}
)

func split(s string, sep map[rune]bool) (tokens []string) {
    fields := strings.FieldsFunc(s, func(r rune) bool {
        return sep[r]
    })
    for _, s2 := range fields {
        s2 = strings.TrimSpace(s2)
        if s2 != "" {
            tokens = append(tokens, s2)
        }
    }
    return
}

Testing it:

fmt.Printf("%q\n", split("a,b;c, de; ; fg ", sep1))
fmt.Printf("%q\n", split("a[b]c[ de/ / fg ", sep2))

Output is the same. Try this one on the Go Playground .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM