简体   繁体   English

Golang:为什么regexp.FindAllStringSubmatch()返回[] []字符串而不是[] string?

[英]Golang: Why does regexp.FindAllStringSubmatch() returns [][]string and not []string?

I am kind of new to go and that's the first time I have to deal with regexp. 我有点新手,这是我第一次处理regexp。

I am a bit surprised that the someregex.FindAllStringSubmatch("somestring", -1) returns a slice of slice [][]string instead of a simple slice of string : []string . 我感到有些惊讶, someregex.FindAllStringSubmatch("somestring", -1)返回一个slice [][]string的切片,而不是一个简单的string: []string切片。

example : 例如:

someRegex, _ := regexp.Compile("^.*(mes).*$")
matches := someRegex.FindAllStringSubmatch("somestring", -1)
fmt.Println(matches) // logs [[somestring mes]]

What is the reason of this behavior, I can't figure it out ? 这种行为的原因是什么,我无法弄清楚?

The func (*Regexp) FindAllStringSubmatch extracts matches and captured submatches. func (*Regexp) FindAllStringSubmatch提取匹配项和捕获的子匹配项。

A submatch is a part of the text that is matched by the regex part that is enclosed with a pair of unescaped parentheses (a so called capturing group ). 子匹配项是文本的一部分,由正则表达式部分匹配,该正则表达式部分用一对未转义的括号(所谓的捕获组 )括起来。

In your case, ^.*(mes).*$ matches: 对于您的情况, ^.*(mes).*$匹配:

  • ^ - start of string ^ -字符串开头
  • .* - any 0+ chars as many as possible .* -尽可能多的0个字符
  • (mes) - Capturing group 1 : a mes substring (mes) - 捕获组1mes子字符串
  • .*$ - the rest of the string. .*$ -字符串的其余部分。

So, the match value is the whole string. 因此,匹配值是整个字符串。 It will be the first value in the output. 这将是输出中的第一个值。 Then, since there is a capturing group, there must be a place for it in the results, hence, mes is placed as the second item in the list. 然后,由于存在捕获组,因此结果中必须有一个位置,因此, mes将作为列表中的第二项放置。

Since there may be more matches than 1, we need a list of lists. 由于匹配项可能超过1,因此我们需要一个列表列表。

A better example may be the one with several match / submatch extraction (and maybe an optional group, too): 一个更好的示例可能是具有多个匹配/子匹配提取的示例(也可能是可选组):

package main

import (
    "fmt"
    "regexp"
)

func main() {
    someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
    matches := someRegex.FindAllStringSubmatch("somestri", -1)
    fmt.Printf("%q\n", matches)
}

The [^aouiye]([aouiye])([^aouiye])? [^aouiye]([aouiye])([^aouiye])? matches a non-vowel, a vowel, and a non-vowel, capturing the last 2 into separate groups #1 and #2. 匹配一个非元音,一个元音和一个非元音,将最后2个捕获到单独的组#1和#2中。

The results are [["som" "o" "m"] ["ri" "i" ""]] . 结果是[["som" "o" "m"] ["ri" "i" ""]] There are 2 matches, and each contains a match value, Group 1 value and Group 2 value. 有2个匹配项,每个匹配项包含一个匹配值,组1值和组2值。 Since the ri match has no text captured into Group 2 ( ([^aouiye])? ), it is empty, but it is still there since the group is defined in the regex pattern. 由于ri匹配没有捕获到第2组( ([^aouiye])? )中的文本,因此它为空,但是由于该组是在正则表达式模式中定义的,因此它仍然存在。

FindAllStringSubmatch is the 'All' version of FindStringSubmatch; FindAllStringSubmatch是FindStringSubmatch的“全部”版本; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. 它返回表达式的所有连续匹配的一部分,如程序包注释中的“全部”描述所定义。 A return value of nil indicates no match. 返回值nil表示不匹配。

Docs . 文件

To sum up: You need an array of arrays of strings, because this is the all version of FindStringSubmatch. 总结:您需要一个字符串数组数组,因为这是FindStringSubmatch的所有版本。 FindStringSubmatch will return a single string array. FindStringSubmatch将返回单个字符串数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM