如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？

Question

I am new to Lucene.NET. 我是Lucene.NET的新手。 I am adding fields as 我正在添加字段

Field.Index.NOT_ANALYZED

in a Lucene document. 在Lucene文档中。 There is one default field which is added in document as 有一个默认字段在文档中添加为

Field.Index.ANALYZED

I have no difficulty in searching the default field; 我在搜索默认字段时没有任何困难; but when I search on a specific field then Lucene returns 0 document. 但是当我搜索特定字段时，Lucene返回0文档。 However if I change, 但是，如果我改变，

Field.Index.NOT_ANALYZED

to 至

Field.Index.ANALYZED

then things work properly. 事情正常。 I think there is something to do with Analyzer. 我认为与Analyzer有关。 Can any body guide me on how to search a Field.Index.NOT_ANALYZED field? 任何人都可以指导我如何搜索Field.Index.NOT_ANALYZED字段吗？

Here is how I am creating the query parser: 以下是我创建查询解析器的方法：

QueryParser parser = 
    new QueryParser(
        Version.LUCENE_30, 
        "content", 
        new StandardAnalyzer(Version.LUCENE_30));

Answer 1

ANALYZED just means that the value is passed through an Analyzer before being indexed, while NOT_ANALYZED means that the value will be indexed as-is. ANALYZED只表示该值在被索引之前通过Analyzer传递，而NOT_ANALYZED表示该值将按原样索引。 The later means that a value like "hello world" will be indexed as just exactly that, the string "hello world". 后者意味着像“hello world”这样的值将被编入索引，就像字符串“hello world”一样。 However, the syntax for the QueryParser class parses spaces as a term-separator, creating two terms "hello" and "world". 但是，QueryParser类的语法将空格解析为term-separator，创建两个术语“hello”和“world”。

You will be able to match the field if you created a var q = new TermQuery(new Term(field, "hello world")) instead of calling var q = queryParser.Parse(field, "hello world") . 如果您创建了一个var q = new TermQuery(new Term(field, "hello world"))而不是调用var q = queryParser.Parse(field, "hello world")您将能够匹配该字段。

Answer 2

The issue seems to be using search values that do not match literally the values currently indexed; 问题似乎是使用与字面上与当前索引的值不匹配的搜索值; in other words, trying to match document containing hello world with a search for Hello World . 换句话说，尝试将包含hello world文档与搜索Hello World进行匹配。 Since all your fields are marker as NOT_ANALYZED Lucene is not processing (using an analyzer/tokenizer) the terms; 由于所有字段都标记为NOT_ANALYZED Lucene不处理（使用分析器/标记器）条款; it is simply indexing as they are passed, storing a string like hello world as hello world . 它只是在传递时进行索引，将hello world之类的字符串存储为hello world 。 For a search to return a match on that document, the search term needs to be exactly 要搜索返回该文档的匹配项，搜索项必须完全正确

hello world

and not , Hello World or hello world. 而不是， Hello World或hello world。 or Hello . 或你好。 All of these searches will return zero matches. 所有这些搜索都将返回零匹配。 For Lucene, it would be like trying to search for the number 3 , and get a match for 2 or 4 (as illogical as it might sound). 对于Lucene来说，这就像试图搜索数字3 ，得到2或4的匹配（听起来不合逻辑）。

This is why the use of NOT_ANALYZED is only recommended for ID-type fields where you want the search to return an exact match, not a list of related/similar field values. 这就是为什么NOT_ANALYZED的使用仅建议用于您希望搜索返回完全匹配的ID类型字段，而不是相关/类似字段值的列表。

The advantage of using ANALYZED is that the search becomes more intuitive and friendly. 使用ANALYZED的优势在于搜索变得更加直观和友好。 Indexing a value like hello world will break the term down into tokens (to provide for partial matches like hello or world or even ello ) and stored in all-lowercase to avoid mismatches due to different casing (like Hello World or hELLO ). 索引像hello world这样的值会将该术语分解为标记（以提供像hello或world或甚至ello的部分匹配）并以全小写形式存储，以避免由于不同的大小写（如Hello World或hELLO ）而导致的不匹配。

如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？

问题描述

2 个解决方案

解决方案1
12 已采纳 2013-07-03 13:18:36

解决方案2
2 2013-07-03 15:24:17

如何在Lucene.NET中搜索Field.Index.NOT_ANALYZED字段？

问题描述

2 个解决方案

解决方案1 12 已采纳 2013-07-03 13:18:36

解决方案2 2 2013-07-03 15:24:17

解决方案1
12 已采纳 2013-07-03 13:18:36

解决方案2
2 2013-07-03 15:24:17