简体   繁体   English

Lucene.NET-检查文档是否存在于索引中

[英]Lucene.NET - checking if document exists in index

I have the following code, using Lucene.NET V4, to check if a file exists in my index. 我有以下代码,使用Lucene.NET V4,检查索引中是否存在文件。

bool exists = false;
IndexReader reader = IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(lucenePath), false);
Term term = new Term("filepath", "\\myFile.PDF");
TermDocs docs = reader.TermDocs(term);
if (docs.Next())
{
   exists = true;
}

The file myFile.PDF definitely exists, but it always comes back as false . 文件myFile.PDF确实存在,但始终返回false When I look at docs in debug, its Doc and Freq properties state that they "threw an exception of type 'System.NullReferenceException'. 当我在调试中查看docs时,其DocFreq属性指出它们“引发了'System.NullReferenceException类型的异常”。

First of all, it's a good practice to use the same instance of the IndexReader if you're not going to consider deleted documents - it's going to perform better and it's thread-safe so you can make a static read-only field out of it (although, I can see that you're specifying false for readOnly parameter so in case this is intended, just ignore this paragraph). 首先,如果您不打算考虑已删除的文档,则最好使用相同的IndexReader实例-这样做会更好,并且具有线程安全性,因此您可以从中创建一个静态只读字段(尽管,我可以看到您为readOnly参数指定了false ,因此在有意的情况下,只需忽略此段即可)。

As for your case, are you tokenizing filepath field values? 对于您的情况,您是否标记文件filepath字段值? Because if you are (eg by using StandardAnalyzer when indexing/searching), you will probably have problems finding values such as \\myFile.PDF (with default tokenizer, the value is going to be split into myFile and PDF , not sure about the leading backslash). 因为如果您(例如,在建立索引/搜索时使用StandardAnalyzer )可能会遇到查找\\myFile.PDF值的问题(使用默认标记器,该值将被拆分为myFilePDF ,不确定前导反斜杠)。

Hope this helps. 希望这可以帮助。

You may have analyzed the field "filepath" during indexing with an analyzer which tokenizes/changes the content. 您可能已在索引器中使用分析器对字段“文件路径”进行了分析,该分析器标记/更改了内容。 eg the StandardAnalyzer tokenizes, lowercases, removes stopwords if specified etc. 例如StandardAnalyzer标记化,小写,删除停用词(如果已指定)等。

If you only need to query with the exact filepath like in your example use the KeywordAnalyzer during indexing for this field. 如果只需要使用示例中的确切文件路径进行查询,则在对该字段建立索引期间使用KeywordAnalyzer。

If you can't re-index at the moment you need to find out which analyzer is used during indexing and use it to create your query. 如果您目前无法重新建立索引,则需要找出在建立索引期间使用了哪个分析器,然后使用它来创建查询。 You have two options: 您有两种选择:

  1. Use a query parser with the right analyzer and parse the query filepath:\\\\myFile.PDF . 将查询解析器与正确的分析器一起使用,并解析查询filepath:\\\\myFile.PDF If the resultung query is a TermQuery you can use its term as you did in your example. 如果resultung查询是TermQuery,则可以像在示例中一样使用它的术语。 Otherwise perform a search with the query. 否则,使用查询进行搜索。
  2. Use the Analyzer directly to create the terms from the TokenStream object. 直接使用分析器从TokenStream对象创建术语。 Again, if only one term, do it as you did, if multipe terms, create a phrase query. 同样,如果只有一个术语,请像您一样进行;如果是多个术语,则创建短语查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM