Fast way to find substring in text using suffix array and lcp

Question

I'm trying to find words which contains substring (as input) in huge text. The text looks like this: *america*python*erica*escape*.. Example: Input: "rica" => Output: america,erica

I use suffix array.

My pseudocode (pythonlike) is:

firstChar=input[0] // the first character of input
suffixArray=getSuffixArray(text) // suffix array
result=[]

for every index of suffix array which points to firstChar:
    length=len(input)
    indexText=text[suffixArray[index]]
    indexes=[]

    if input in text[indexText: indexText+length]:
        word=find whole word containig this index between '*' 
        result.append(word)

This works, but it is too slow. LCP array should improve a runtime of algorhitm but I can't figure out how. Will you give me an advice?

Thanks in advance!

Answer 1

A free Python code for suffix array is at Effcient way to find longest duplicate string . It works up to 100 million characters on a personal computer.

Fast way to find substring in text using suffix array and lcp

Question

1 answers

solution1
0 2015-01-13 21:34:35

Fast way to find substring in text using suffix array and lcp

Question

1 answers

solution1 0 2015-01-13 21:34:35

solution1
0 2015-01-13 21:34:35