简体   繁体   English

如何在Lucene中索引字符串?

[英]How to index a String in Lucene?

I'm using Lucene to index strings which I read from document. 我正在使用Lucene索引从文档中读取的字符串。 I'm not using reader class, since I need to index string to different fields. 我不使用阅读器类,因为我需要将字符串索引到不同的字段。

document.add(new Field("FIELD1","string1", Field.Store.YES, Field.Index.UNTOKENIZED));
document.add(new Field("FIELD2","string2", Field.Store.YES, Field.Index.UNTOKENIZED));

This works in building the index but searching 这在建立索引但搜索时起作用

QueryParser queryParser = new QueryParser("FIELD1", new StandardAnalyzer());
Query query = queryParser.parse(searchString);
Hits hits = indexSearcher.search(query);
System.out.println("Number of hits: " + hits.length());

doesn't returns any result. 不返回任何结果。

But when I index a sentence like, 但是当我索引一个句子

document.add(new Field("FIELD1","This is sentence to be indexed", Field.Store.YES, Field.Index.TOKENIZED));

searching works fine. 搜索工作正常。

Thanks. 谢谢。

You need to set the parameter for the fields with the words also to Field.Index.TOKENIZED because searching is only possible when you tokenize. 您需要将字段的参数也设置为Field.Index.TOKENIZED,因为只有在标记化时才可以搜索。 The word "string1" will be indexed as "string1". 单词“ string1”将被索引为“ string1”。 Without tokenization it won't be indexed at all. 没有标记化,它将根本不会被索引。

Use this: 用这个:

document.add(new Field("FIELD1","string1", Field.Store.YES, Field.Index.TOKENIZED));
document.add(new Field("FIELD2","string2", Field.Store.YES, Field.Index.TOKENIZED));

When you want to index a string containing multiple words, eg "two words" as one searchable element without tokenizing into 2 words, you either need to use the KeywordAnalyzer during indexing which takes the whole string as a token or you can use the StringField object in newer versions of Lucene. 当您希望将包含多个单词(例如“两个单词”)的字符串索引为一个可搜索元素而没有标记化为两个单词时,您要么在索引期间需要使用KeywordAnalyzer(将整个字符串作为标记),要么可以使用StringField对象在更新版本的Lucene中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM