Apache Lucene在文件路径上建立索引和搜索

Question

I am using apache lucene to index the html files. 我正在使用Apache Lucene来索引html文件。 I am storing the path of the html files in the lucene index . 我将html文件的路径存储在lucene索引中。 Its storing the index and , i have checked it in luke all. 它存储索引和，我已经检查了一下。 But when i am searching the path of the file its returning the no of documents very much high . 但是当我搜索文件的路径时，它返回的文档数非常高。 i want it should search the exact path as it was stored in the lucene index. 我希望它应该搜索存储在lucene索引中的确切路径。 i am using the following code 我正在使用以下代码

for index creation


   try{
         File indexDir=new File("d:/abc/")
        IndexWriter indexWriter = new IndexWriter(
             FSDirectory.open(indexDir),
            new SimpleAnalyzer(),
            true,
            IndexWriter.MaxFieldLength.LIMITED);
            indexWriter.setUseCompoundFile(false);
        Document doc= new Document();
        String path=f.getCanonicalPath();
          doc.add(new Field("fpath",path,
        Field.Store.YES,Field.Index.ANALYZED));
        indexWriter.addDocument(doc);
        indexWriter.optimize();
        indexWriter.close();
     }
    catch(Exception ex )
    {
     ex.printStackTrace();
    }



  Following the code for searching the filepath

        File indexDir = new File("d:/abc/");
           int maxhits = 10000000;
                     int len = 0;
                try {
                    Directory directory = FSDirectory.open(indexDir);
                     IndexSearcher searcher = new IndexSearcher(directory, true);
                    QueryParser parser = new QueryParser(Version.LUCENE_36,"fpath", new SimpleAnalyzer());
                    Query query = parser.parse(path);
                    query.setBoost((float) 1.5);
                    TopDocs topDocs = searcher.search(query, maxhits);
                    ScoreDoc[] hits = topDocs.scoreDocs;
                   len = hits.length;
                   JOptionPane.showMessageDialog(null,"items found"+len);

                 }
                catch(Exception ex)
               {
                 ex.printStackTrace();
              }

its showing the no of documents found as total no of document while the searched path file exists only once 其显示的文档数为文档总数，而搜索到的路径文件仅存在一次

Answer 1

You are analyzing the path, which will split it into separate terms. 您正在分析路径，这会将其分成单独的术语。 The root path term (like catalog in /catalog/products/versions ) likely occurs in all documents, so any search that includes catalog without forcing all terms to be mandatory will return all documents. 根路径项（如在目录 /目录/产品/版本 ）可能发生在所有的文件，从而使包括目录，而不强迫所有方面进行任何搜索，以强制将返回所有文档。

You need a search query like (using the example above): 您需要一个类似的搜索查询（使用上面的示例）：

+catalog +products +versions

to force all terms to be present. 强制所有条款都存在。

Note that this gets more complicated if the same set of terms can occur in different orders, like: 请注意，如果一组相同的术语可以以不同的顺序出现，则会变得更加复杂，例如：

/catalog/products/versions
/versions/catalog/products/SKUs

In that case, you need to use a different Lucene tokenizer than the tokenizer in the Standard Analyzer. 在这种情况下，您需要使用与标准分析器中的标记器不同的Lucene标记器。

Apache Lucene在文件路径上建立索引和搜索

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-02-07 12:05:42

Apache Lucene在文件路径上建立索引和搜索

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-02-07 12:05:42

解决方案1
1 已采纳 2013-02-07 12:05:42