简体   繁体   English

golang中创建后缀树

[英]Create a suffix tree in golang

I have an array of strings and I need to create a suffix tree out of it in Golang.我有一个字符串数组,我需要在 Golang 中创建一个后缀树。 SuffixArray in Golang does not suffice my needs, because it only accepts byte array (ie of a single string). Golang 中的 SuffixArray 不能满足我的需要,因为它只接受字节数组(即单个字符串)。 Could anybody provide pointers for implementation.任何人都可以提供实施的指针。 Thanks in advance.提前致谢。

Here is an example of how to use suffix array to do auto completion.这是如何使用后缀数组进行自动完成的示例。 ( playground ). 操场)。

Note that I joined all the strings together with a prefix of \\x00 which can't occur in the strings first.请注意,我将所有字符串与前缀\\x00连接在一起,该前缀不能首先出现在字符串中。

package main

import (
    "fmt"
    "index/suffixarray"
    "regexp"
    "strings"
)

func main() {
    words := []string{
        "aardvark",
        "happy",
        "hello",
        "hero",
        "he",
        "hotel",
    }
    // use \x00 to start each string
    joinedStrings := "\x00" + strings.Join(words, "\x00")
    sa := suffixarray.New([]byte(joinedStrings))

    // User has typed in "he"
    match, err := regexp.Compile("\x00he[^\x00]*")
    if err != nil {
        panic(err)
    }
    ms := sa.FindAllIndex(match, -1)

    for _, m := range ms {
        start, end := m[0], m[1]
        fmt.Printf("match = %q\n", joinedStrings[start+1:end])
    }
}

Prints印刷

match = "hello"
match = "hero"
match = "he"

What you want is called generalized suffix tree.你想要的是广义后缀树。 A simple way to build such trees is to append a different end marker(symbols not used in any of the strings) to each strings, concatenate them and build a normal suffix tree for the concatenated string.构建此类树的一种简单方法是将不同的结束标记(未在任何字符串中使用的符号)附加到每个字符串,将它们连接起来并为连接的字符串构建一个普通的后缀树。 So you just need to add "hello world" to the string set and use:所以你只需要在字符串集中添加“hello world”并使用:

match, err := regexp.Compile("[^\x00]*wor[^\x00]*")

to get the strings contain "wor".获取包含“wor”的字符串。 Note that the correct string is joinedStrings[start:end] .请注意,正确的字符串是joinedStrings[start:end]

I created implementation of suffix tree with O(n) complexity, where n is length of string: https://github.com/twelvedata/searchindex我创建了复杂度为 O(n) 的后缀树的实现,其中 n 是字符串的长度: https : //github.com/twelvedata/searchindex

More details in my article on Medium https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26在我关于 Medium https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26 的文章中有更多细节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM