如何从 Golang 中的字符串中删除多余的空格/空格？

Question

我想知道如何删除：

所有前导/尾随空格或换行符、null 个字符等。
字符串中的任何冗余空格（例如“hello[space][space]world”将被转换为“hello[space]world”）

这是否可能使用单个正则表达式，unicode 支持国际空格字符等？

Answer 1

您可以使用strings包作为strings.Fields为您完成大部分工作：

package main

import (
    "fmt"
    "strings"
)

func standardizeSpaces(s string) string {
    return strings.Join(strings.Fields(s), " ")
}

func main() {
    tests := []string{" Hello,   World  ! ", "Hello,\tWorld ! ", " \t\n\t Hello,\tWorld\n!\n\t"}
    for _, test := range tests {
        fmt.Println(standardizeSpaces(test))
    }
}
// "Hello, World !"
// "Hello, World !"
// "Hello, World !"

Answer 2

似乎您可能想要同时使用\\s速记字符类和\\p{Zs} Unicode 属性来匹配 Unicode 空格。 但是，这两个步骤都不能用 1 个正则表达式替换完成，因为您需要两个不同的替换，并且ReplaceAllStringFunc只允许整个匹配字符串作为参数（我不知道如何检查匹配的组）。

因此，我建议使用两个正则表达式：

^[\\s\\p{Zs}]+|[\\s\\p{Zs}]+$ - 匹配所有前导/尾随空格
[\\s\\p{Zs}]{2,} - 匹配字符串中的 2 个或更多空白符号

示例代码：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    input := "   Text   More here     "
    re_leadclose_whtsp := regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`)
    re_inside_whtsp := regexp.MustCompile(`[\s\p{Zs}]{2,}`)
    final := re_leadclose_whtsp.ReplaceAllString(input, "")
    final = re_inside_whtsp.ReplaceAllString(final, " ")
    fmt.Println(final)
}

Answer 3

strings.Fields() 在任意数量的空白处拆分，因此：

strings.Join(strings.Fields(strings.TrimSpace(s)), " ")

Answer 4

避免使用浪费时间的正则表达式或外部库
我选择使用普通的 golang 而不是 regexp，因为在每种语言中都有不是 ASCII 的特殊字符。

去高朗！

func RemoveDoubleWhiteSpace(str string) string {
    var b strings.Builder
    b.Grow(len(str))
    for i := range str {
        if !(str[i] == 32 && (i+1 < len(str) && str[i+1] == 32)) {
            b.WriteRune(rune(str[i]))
        }
    }
    return b.String()
}

以及相关测试

func TestRemoveDoubleWhiteSpace(t *testing.T) {
    data := []string{`  test`, `test  `, `te  st`}
    for _, item := range data {
        str := RemoveDoubleWhiteSpace(item)
        t.Log("Data ->|"+item+"|Found: |"+str+"| Len: ", len(str))
        if len(str) != 5 {
            t.Fail()
        }
    }
}

Answer 5

使用单个 regexp 使用 regexp.MustCompile() 获取所有空间并将它们替换为单个空格，最后修剪前导空格。

    package main

    import (
        "fmt"
        "regexp"
        "strings"
    )
    
    func main() {
        input := "    Text   More here        "
        re := regexp.MustCompile(`\s+`)
        out := re.ReplaceAllString(input, " ")
        out = strings.TrimSpace(out)
        fmt.Println(out)
    }

或者，使用“_”代替空格。

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    input := "___Text___More_here______"
    re := regexp.MustCompile(`_+`)
    out := re.ReplaceAllString(input, "_")
    out = strings.Trim(out, "_")
    fmt.Println(out)
}

Answer 6

// Ref: https://stackoverflow.com/a/42251527/18152508
func StrFields(input string) string {
    return strings.Join(strings.Fields(input), " ")
}

// Ref: https://stackoverflow.com/a/37293398/18152508
func RegexReplace(input string) string {
    re_leadclose_whtsp := regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`)
    re_inside_whtsp := regexp.MustCompile(`[\s\p{Zs}]{2,}`)

    return re_inside_whtsp.ReplaceAllString(
        re_leadclose_whtsp.ReplaceAllString(input, ""),
        " ",
    )
}

// Ref: https://stackoverflow.com/a/67152714/18152508
func SingleRegexp(input string) string {
    re := regexp.MustCompile(`\s+`)

    return strings.TrimSpace(re.ReplaceAllString(input, " "))
}

对上述 3 个函数进行基准测试，使用strings.Fields function 的 @ifross 方法比接受的答案和使用正则表达式要快得多。

因此，我更喜欢strings.Fields和strings.Join的组合，只是为了减少字符串中冗余和重复的空格。

$ go test -bench . -count 30 ./... > bench.txt && benchstat bench.txt
name            time/op
StrFields-4      204ns ± 1%
RegexReplace-4  8.35µs ± 2%
SingleRegexp-4  2.49µs ± 1%

var testData = []string{
    " Hello,   World  ! ",
    // "Hello,\tWorld ! ",
    // " \t\n\t Hello,\tWorld\n!\n\t",
}

var testFunc = []struct {
    name string
    exec func(string) string
}{
    {name: "StrFields", exec: StrFields},
    {name: "RegexReplace", exec: RegexReplace},
    {name: "SingleRegexp", exec: SingleRegexp},
}

func Test(t *testing.T) {
    for _, targetFunc := range testFunc {
        t.Run(targetFunc.name, func(t *testing.T) {
            for index, input := range testData {
                expect := "Hello, World !"
                actual := targetFunc.exec(input)

                if expect != actual {
                    t.Errorf("#%d: expect %q, actual %q", index+1, expect, actual)
                }
            }
        })
    }
}

func BenchmarkStrFields(b *testing.B) {
    const data = " Hello,   World  ! "

    for i := 0; i < b.N; i++ {
        _ = StrFields(data)
    }
}

func BenchmarkRegexReplace(b *testing.B) {
    const data = " Hello,   World  ! "

    for i := 0; i < b.N; i++ {
        _ = RegexReplace(data)
    }
}

func BenchmarkSingleRegexp(b *testing.B) {
    const data = " Hello,   World  ! "

    for i := 0; i < b.N; i++ {
        _ = SingleRegexp(data)
    }
}

Answer 7

为此使用正则表达式。

func main() {
    data := []byte("   Hello,   World !   ")
    re := regexp.MustCompile("  +")
    replaced := re.ReplaceAll(bytes.TrimSpace(data), []byte(" "))
    fmt.Println(string(replaced))
    // Hello, World !
}

为了还修剪换行符和空字符，您可以使用bytes.Trim(src []byte, cutset string) bytes.TrimSpace bytes.Trim(src []byte, cutset string)函数而不是bytes.TrimSpace

如何从 Golang 中的字符串中删除多余的空格/空格？

问题描述

7 个解决方案

解决方案1
73 2017-02-15 14:03:36

解决方案2
27 已采纳 2016-05-18 07:51:20

解决方案3
7 2019-03-31 03:08:43

解决方案4
0 2019-11-03 21:24:09

解决方案5
0 2021-04-18 19:41:17

解决方案6
0 2022-12-09 15:39:10

解决方案7
-1 2016-05-18 05:24:11

如何从 Golang 中的字符串中删除多余的空格/空格？

问题描述

7 个解决方案

解决方案1 73 2017-02-15 14:03:36

解决方案2 27 已采纳 2016-05-18 07:51:20

解决方案3 7 2019-03-31 03:08:43

解决方案4 0 2019-11-03 21:24:09

解决方案5 0 2021-04-18 19:41:17

解决方案6 0 2022-12-09 15:39:10

解决方案7 -1 2016-05-18 05:24:11

解决方案1
73 2017-02-15 14:03:36

解决方案2
27 已采纳 2016-05-18 07:51:20

解决方案3
7 2019-03-31 03:08:43

解决方案4
0 2019-11-03 21:24:09

解决方案5
0 2021-04-18 19:41:17

解决方案6
0 2022-12-09 15:39:10

解决方案7
-1 2016-05-18 05:24:11