[英]Create a suffix tree in golang
I have an array of strings and I need to create a suffix tree out of it in Golang.我有一个字符串数组,我需要在 Golang 中创建一个后缀树。 SuffixArray in Golang does not suffice my needs, because it only accepts byte array (ie of a single string).
Golang 中的 SuffixArray 不能满足我的需要,因为它只接受字节数组(即单个字符串)。 Could anybody provide pointers for implementation.
任何人都可以提供实施的指针。 Thanks in advance.
提前致谢。
Here is an example of how to use suffix array to do auto completion.这是如何使用后缀数组进行自动完成的示例。 ( playground ).
(操场)。
Note that I joined all the strings together with a prefix of \\x00
which can't occur in the strings first.请注意,我将所有字符串与前缀
\\x00
连接在一起,该前缀不能首先出现在字符串中。
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
words := []string{
"aardvark",
"happy",
"hello",
"hero",
"he",
"hotel",
}
// use \x00 to start each string
joinedStrings := "\x00" + strings.Join(words, "\x00")
sa := suffixarray.New([]byte(joinedStrings))
// User has typed in "he"
match, err := regexp.Compile("\x00he[^\x00]*")
if err != nil {
panic(err)
}
ms := sa.FindAllIndex(match, -1)
for _, m := range ms {
start, end := m[0], m[1]
fmt.Printf("match = %q\n", joinedStrings[start+1:end])
}
}
Prints印刷
match = "hello"
match = "hero"
match = "he"
What you want is called generalized suffix tree.你想要的是广义后缀树。 A simple way to build such trees is to append a different end marker(symbols not used in any of the strings) to each strings, concatenate them and build a normal suffix tree for the concatenated string.
构建此类树的一种简单方法是将不同的结束标记(未在任何字符串中使用的符号)附加到每个字符串,将它们连接起来并为连接的字符串构建一个普通的后缀树。 So you just need to add "hello world" to the string set and use:
所以你只需要在字符串集中添加“hello world”并使用:
match, err := regexp.Compile("[^\x00]*wor[^\x00]*")
to get the strings contain "wor".获取包含“wor”的字符串。 Note that the correct string is
joinedStrings[start:end]
.请注意,正确的字符串是
joinedStrings[start:end]
。
I created implementation of suffix tree with O(n) complexity, where n is length of string: https://github.com/twelvedata/searchindex我创建了复杂度为 O(n) 的后缀树的实现,其中 n 是字符串的长度: https : //github.com/twelvedata/searchindex
More details in my article on Medium https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26在我关于 Medium https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26 的文章中有更多细节
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.