简体   繁体   中英

Suffix Array Implementation in Java

I'm looking to write an efficient n-order Markov chain method to generate random text strings given a set of example text. I currently have a Java implementation that uses several layers of Maps, but it's clunky. A suffix array would be perfect for my needs, but I'm not clear if that can be efficiently implemented in Java.

In CI might do something like:

char exampleText[MAX];
char *suffixArray[MAX];
...
while(n<MAX && suffixArray[n++] = &exampleText[n]);
sort(suffixArray);

This gets gnarly in Java since I'd have to take substrings of exampleText , or turn suffixArray into an array of indices, or something else.

Any suggestions for a good approach to this in Java?

String will [typically] do that for you. (Typical implementations share backing arrays when created with substring , although that is subject to change at any time.)

You can make some variants form array of suffixes:

First:

public static String[] suffixes(String s)
{
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = s.substring(i, N);
return suffixes;
}

Second:

public static String[] suffixes(String s)
{
int N = s.length();
StringBuilder sb = new StringBuilder(s);
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = sb.substring(i, N);
return suffixes;
}

For anyone interested in more efficient ways of constructing the suffix array in Java, I once used a library called jsuffixarrays . The code is here . It offers a range of construction algorithms to choose from and I found it to work very well. Eg to use the SKEW algorithm you do this:

import org.jsuffixarrays.Algorithm;
import org.jsuffixarrays.ISuffixArrayBuilder;
import org.jsuffixarrays.SuffixArrays;
import org.jsuffixarrays.SuffixData;

String              exampleText = "....";
ISuffixArrayBuilder suxBuilder  = Algorithm.SKEW.getDecoratedInstance();
SuffixData          sux         = SuffixArrays.createWithLCP(text,sux_builder);

/* Then, to access the suffix array: */
sux.getSuffixArray();
/* And, to access the LCP array: */
sux.getLCP();

You can build without the LCP array if don't need that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM