[英]Lucene: how to index file names
I'm newbie lucene user and trying to get some basics now. 我是新手Lucene用户,现在尝试获取一些基础知识。
I have three files: 我有三个文件:
apache_empty.txt
(empty file), apache_empty.txt
(空文件), apache.txt
(contains many of 'apache'
tokens), apache.txt
(包含许多'apache'
令牌), other.txt
(contains just one token - 'apache'
) other.txt
(仅包含一个令牌other.txt
'apache'
) When I try to search 'apache'
, I get only apache.txt
and other.txt
in result, but I wanna get even the apache_empty.txt
file, which has the searched word in its name... 当我尝试搜索
'apache'
, 我只得到 apache.txt
和other.txt
的结果,但是我什至想要得到apache_empty.txt
文件,该文件的名称中包含搜索到的单词...
And that's how I add documents to the index: 这就是我将文档添加到索引的方式:
protected Document getDocument(File f) throws Exception
{
Document doc = new Document();
Field contents = new Field("contents", new FileReader(f));
Field parent = new Field("parent", f.getParent(), Field.Store.YES, Field.Index.NOT_ANALYZED);
Field filename = new Field("filename", f.getName(), Field.Store.YES, Field.Index.ANALYZED);
Field fullpath = new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED);
filename.setBoost(2.0F);
doc.add(contents);
doc.add(parent);
doc.add(filename);
doc.add(fullpath);
return doc;
}
How to let the lucene index also file names? 如何让Lucene索引也使用文件名?
To enable wildcards you should search for apache*
which would also match your filename apache_empty
for the complete syntax see also Apache Lucene Query Parser . 要启用通配符,您应该搜索
apache*
,该apache*
也要与文件名apache_empty
匹配以获取完整的语法,另请参阅Apache Lucene查询解析器 。
An alternative would be to include the underscore as a word separator in the used analyzer. 另一种选择是在所使用的分析器中包括下划线作为单词分隔符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.