简体   繁体   English

Java中的后缀数组实现

[英]Suffix Array Implementation in Java

I'm looking to write an efficient n-order Markov chain method to generate random text strings given a set of example text. 我正在寻找一个有效的n阶马尔可夫链方法,在给定一组示例文本的情况下生成随机文本字符串。 I currently have a Java implementation that uses several layers of Maps, but it's clunky. 我目前有一个使用多层Maps的Java实现,但它很笨重。 A suffix array would be perfect for my needs, but I'm not clear if that can be efficiently implemented in Java. 后缀数组对于我的需求是完美的,但我不清楚是否可以在Java中有效地实现。

In CI might do something like: 在CI中可能会执行以下操作:

char exampleText[MAX];
char *suffixArray[MAX];
...
while(n<MAX && suffixArray[n++] = &exampleText[n]);
sort(suffixArray);

This gets gnarly in Java since I'd have to take substrings of exampleText , or turn suffixArray into an array of indices, or something else. 这得到在Java中粗糙,因为我不得不采取的子exampleText ,或将suffixArray到索引数组,或别的东西。

Any suggestions for a good approach to this in Java? 有没有在Java中采用这种方法的建议?

String will [typically] do that for you. String将[通常]为您执行此操作。 (Typical implementations share backing arrays when created with substring , although that is subject to change at any time.) (当使用substring创建时,典型实现共享后备数组,尽管可能随时更改。)

You can make some variants form array of suffixes: 您可以使用后缀数组制作一些变体:

First: 第一:

public static String[] suffixes(String s)
{
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = s.substring(i, N);
return suffixes;
}

Second: 第二:

public static String[] suffixes(String s)
{
int N = s.length();
StringBuilder sb = new StringBuilder(s);
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = sb.substring(i, N);
return suffixes;
}

For anyone interested in more efficient ways of constructing the suffix array in Java, I once used a library called jsuffixarrays . 对于任何对在Java中构造后缀数组的更有效方法感兴趣的人,我曾经使用过一个名为jsuffixarrays的库。 The code is here . 代码在这里 It offers a range of construction algorithms to choose from and I found it to work very well. 它提供了一系列构造算法可供选择,我发现它可以很好地工作。 Eg to use the SKEW algorithm you do this: 例如,要使用SKEW算法,请执行以下操作:

import org.jsuffixarrays.Algorithm;
import org.jsuffixarrays.ISuffixArrayBuilder;
import org.jsuffixarrays.SuffixArrays;
import org.jsuffixarrays.SuffixData;

String              exampleText = "....";
ISuffixArrayBuilder suxBuilder  = Algorithm.SKEW.getDecoratedInstance();
SuffixData          sux         = SuffixArrays.createWithLCP(text,sux_builder);

/* Then, to access the suffix array: */
sux.getSuffixArray();
/* And, to access the LCP array: */
sux.getLCP();

You can build without the LCP array if don't need that. 如果不需要,您可以在没有LCP阵列的情况下构建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM