[英]How can I index/search + and - symbol in lucene?
我需要搜索單詞“ I + D”,而我的分析儀不能使用+
(加號)和-
(減號)符號。 如何搜尋?
我的個人分析器:
/**
* Copyright (c) 2006 Hugo Zaragoza and Jose R. P�rez-Ag�era
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of copyright holders nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
/**
* Spanish Lucene analyzer
* @author Hugo Zaragoza and Jose R. P�rez-Ag�era
*/
public class SpanishAnalyzer extends Analyzer {
private Set stopSet;
/**
* Creates the Lucene Spanish Analyzer
* @throws IOException
*/
public SpanishAnalyzer() throws IOException {
super();
stopSet = StopFilter.makeStopSet(loadStopWords());
}
/** Constructs a {@link StandardTokenizer} filtered by a {@link
StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new SpanishStemmerFilter(result);
return result;
}
/**
* Loads the spanish stop-words list
* @throws IOException
*/
private static String[] loadStopWords() throws IOException {
InputStream inputStream = new FileInputStream("stopwords-spanish.txt");
//InputStream inputStream = new FileInputStream("/home/becario/Escritorio/CVTKAxel/lib/stopwords-spanish.txt");
Reader reader = new InputStreamReader(inputStream);
BufferedReader br = new BufferedReader(reader);
String line = br.readLine();
ArrayList<String> list = new ArrayList<String>();
while (line != null) {
list.add(line.trim());
line = br.readLine();
}
String stopWords[] = new String[list.toArray().length];
for (int i = 0; i < list.toArray().length; i++) {
stopWords[i] = (String) list.get(i);
}
return stopWords;
}
}
“不起作用”是什么意思? 分析儀應能夠正常處理這些字符。 您是指QueryParser嗎? 如果是這樣,您可以繞過它並手動創建一個查詢,例如TermQuery 。
Query q = new TermQuery(new Term("field", "I+D"));
還是您指的是StandardTokenizer在非單詞字符(例如'+'或'-')上拆分令牌的事實? 如果是這樣,您可以簡單地使用其他變量(例如WhitespaceTokenizer )或實現自己的變量。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.