在沒有索引的情況下使用 Lucene Analyzer - 我的方法合理嗎？

Question

我的目標是利用 Lucene 的許多標記器和過濾器中的一些來轉換輸入文本，但不創建任何索引。

例如，給定這個（人為的）輸入字符串......

" Someone's - [texté] goes here, foo . "

...還有像這樣的 Lucene 分析器...

Analyzer analyzer = CustomAnalyzer.builder()
        .withTokenizer("icu")
        .addTokenFilter("lowercase")
        .addTokenFilter("icuFolding")
        .build();

我想獲得以下輸出：

someone's texte goes here foo

下面的 Java 方法做我想要的。

但是有沒有更好的（即更典型和/或更簡潔）的方式我應該這樣做？

我特別在想我使用TokenStream和CharTermAttribute ，因為我以前從未像這樣使用過它們。 感覺笨重。

這是代碼：

Lucene 8.3.0 導入：

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.custom.CustomAnalyzer;

我的方法：

private String transform(String input) throws IOException {

    Analyzer analyzer = CustomAnalyzer.builder()
            .withTokenizer("icu")
            .addTokenFilter("lowercase")
            .addTokenFilter("icuFolding")
            .build();

    TokenStream ts = analyzer.tokenStream("myField", new StringReader(input));
    CharTermAttribute charTermAtt = ts.addAttribute(CharTermAttribute.class);

    StringBuilder sb = new StringBuilder();
    try {
        ts.reset();
        while (ts.incrementToken()) {
            sb.append(charTermAtt.toString()).append(" ");
        }
        ts.end();
    } finally {
        ts.close();
    }
    return sb.toString().trim();
}

Answer 1

我已經使用這個設置幾個星期了，沒有問題。 我還沒有找到更簡潔的方法。 我認為問題中的代碼沒問題。

在沒有索引的情況下使用 Lucene Analyzer - 我的方法合理嗎？

問題描述

1 個解決方案

解決方案1
0 已采納 2020-03-14 23:22:16

在沒有索引的情況下使用 Lucene Analyzer - 我的方法合理嗎？

問題描述

1 個解決方案

解決方案1 0 已采納 2020-03-14 23:22:16

解決方案1
0 已采納 2020-03-14 23:22:16