[英]Lucene.NET - checking if document exists in index
I have the following code, using Lucene.NET V4, to check if a file exists in my index. 我有以下代码,使用Lucene.NET V4,检查索引中是否存在文件。
bool exists = false;
IndexReader reader = IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(lucenePath), false);
Term term = new Term("filepath", "\\myFile.PDF");
TermDocs docs = reader.TermDocs(term);
if (docs.Next())
{
exists = true;
}
The file myFile.PDF
definitely exists, but it always comes back as false
. 文件
myFile.PDF
确实存在,但始终返回false
。 When I look at docs
in debug, its Doc
and Freq
properties state that they "threw an exception of type 'System.NullReferenceException'. 当我在调试中查看
docs
时,其Doc
和Freq
属性指出它们“引发了'System.NullReferenceException类型的异常”。
First of all, it's a good practice to use the same instance of the IndexReader
if you're not going to consider deleted documents - it's going to perform better and it's thread-safe so you can make a static read-only field out of it (although, I can see that you're specifying false
for readOnly
parameter so in case this is intended, just ignore this paragraph). 首先,如果您不打算考虑已删除的文档,则最好使用相同的
IndexReader
实例-这样做会更好,并且具有线程安全性,因此您可以从中创建一个静态只读字段(尽管,我可以看到您为readOnly
参数指定了false
,因此在有意的情况下,只需忽略此段即可)。
As for your case, are you tokenizing filepath
field values? 对于您的情况,您是否标记文件
filepath
字段值? Because if you are (eg by using StandardAnalyzer
when indexing/searching), you will probably have problems finding values such as \\myFile.PDF
(with default tokenizer, the value is going to be split into myFile
and PDF
, not sure about the leading backslash). 因为如果您(例如,在建立索引/搜索时使用
StandardAnalyzer
)可能会遇到查找\\myFile.PDF
值的问题(使用默认标记器,该值将被拆分为myFile
和PDF
,不确定前导反斜杠)。
Hope this helps. 希望这可以帮助。
You may have analyzed the field "filepath" during indexing with an analyzer which tokenizes/changes the content. 您可能已在索引器中使用分析器对字段“文件路径”进行了分析,该分析器标记/更改了内容。 eg the StandardAnalyzer tokenizes, lowercases, removes stopwords if specified etc.
例如StandardAnalyzer标记化,小写,删除停用词(如果已指定)等。
If you only need to query with the exact filepath like in your example use the KeywordAnalyzer during indexing for this field. 如果只需要使用示例中的确切文件路径进行查询,则在对该字段建立索引期间使用KeywordAnalyzer。
If you can't re-index at the moment you need to find out which analyzer is used during indexing and use it to create your query. 如果您目前无法重新建立索引,则需要找出在建立索引期间使用了哪个分析器,然后使用它来创建查询。 You have two options:
您有两种选择:
filepath:\\\\myFile.PDF
. filepath:\\\\myFile.PDF
。 If the resultung query is a TermQuery you can use its term as you did in your example.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.