简体   繁体   中英

Apache lucene indexing

I am creating a text search application for log files using apache lucene. I am using the bellow code to index the files

doc.add(new LongField("modified", file.lastModified(), Field.Store.NO));
doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"))));
doc.add(new StoredField("filename", file.getCanonicalPath()));

Here i am creating 3 indexes for each file But when searching i can retrieve the value of only one index other two come as null. This is the search side code

Document d = searcher.doc(docId);
System.out.println(i+":File name is"+d.get("filename"));
System.out.println(i+":File name is"+d.get("modified"));
System.out.println(i+":File name is"+d.get("contents"));

The output I am getting is

2 total matching documents
0:File name is/home/maclean/NetBeansProjects/LogSearchEngine/src/SimpleSearcher.java
0:File name isnull
0:File name isnull
1:File name is/home/maclean/NetBeansProjects/LogSearchEngine/src/SimpleFileIndexer.java
1:File name isnull
1:File name isnull   

What am i doing wrong

In Lucene, if you want to retrieve the value for a field, you need to store that field. If a field is not stored, on searching its value will be null .

For modified , you've explicitly specified it as a un-stored field by passing the argument Field.Store.NO ; as a result it's value is not being stored in the index and hence, null is returned on search. To store and retrieve its value, you need to change the constructor call to:

doc.add(new LongField("modified", file.lastModified(), Field.Store.YES));

For contents , the constructor you've used creates un-stored field. You need to change its constructor to:

doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8")), Field.Store.YES));

After these changes, you should be able to retrieve both the fields.

You are able to retrieve values for filename because you are using a constructor that creates stored fields by default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM