简体   繁体   中英

Lucene - Creating an Index using FSDirectory

first time posting; long time reader. I apologize a head of time if this was already asked here (I'm new to lucene as well!). I've done a lot of research and wasn't able to find a good explanation/example for my question.

First of all, I've used IKVM.NET to convert lucene 4.9 java to include in my .net application. I chose to do this so I was able to use the most recent version of lucene. No issues.

I am trying to create a basic example to start to learn lucene and to apply it to my app. I've done countless google searches and read lots of articles, apache's website, etc. My code follows mostly the example here: http://www.lucenetutorial.com/lucene-in-5-minutes.html

My question is, I don't believe I want to use RAMDirectory.. right? Since I will be indexing a database and allowing users to search it via the website. I opted for using FSDirectory because I didn't think it should be all stored in memory.

When the IndexWriter is created it is creating new files each time(.cfe, .cfs, .si, segments.gen, write.lock, etc.) It seems to me you would create these files once and then use them until the index needs to be rebuilt?

So how do I create an IndexWriter with out recreating the index files?

Code:

StandardAnalyzer analyzer;
Directory directory;
protected void Page_Load(object sender, EventArgs e)
{
  var version = org.apache.lucene.util.Version.LUCENE_CURRENT;
  analyzer = new StandardAnalyzer(version);

  if(directory == null){ directory= FSDirectory.open(new java.io.File(HttpContext.Current.Request.PhysicalApplicationPath + "/indexes"));
        }

        IndexWriterConfig config = new IndexWriterConfig(version, analyzer);

        //i found setting the open mode will overwrite the files but still creates new each time
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        IndexWriter w = new IndexWriter(directory, config);
        addDoc(w, "test", "1234");
        addDoc(w, "test1", "1234");
        addDoc(w, "test2", "1234");
        addDoc(w, "test3", "1234");
        w.close(); 

}


private static void addDoc(IndexWriter w, String _keyword, String _keywordid)
    {
        Document doc = new Document();
        doc.add(new TextField("Keyword", _keyword, Field.Store.YES));
        doc.add(new StringField("KeywordID", _keywordid, Field.Store.YES));
        w.addDocument(doc);
    }

protected void searchButton_Click(object sender, EventArgs e)
{
        String querystr = "";
        String results=""; 


        querystr = searchTextBox.Text.ToString();

        Query q = new QueryParser(org.apache.lucene.util.Version.LUCENE_4_0, "Keyword", analyzer).parse(querystr);

        int hitsPerPage = 100;
        DirectoryReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);

        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

        if (hits.Length == 0)
        {
           label.Text = "Nothing was found.";
        }
        else
           {
             for (int i = 0; i < hits.Length; ++i)
              {
               int docID = hits[i].doc;
               Document d = searcher.doc(docID);

               results += "<br />" + (i + 1) + ". " + d.get("KeywordID") + "\t" + d.get("Keyword") +   " Hit Score: " + hits[i].score.ToString() + "<br />";

               }
               label.Text = results;
               reader.close(); 
            }
  }

Yes, RAMDirectory is great for quick, on-the-fly tests and tutorials, but in production you will usually want to store your index on the file system through an FSDirectory .

The reason it's rewriting the index every time you open the writer is that you are setting the OpenMode to IndexWriterConfig.OpenMode.CREATE . CREATE means you want to remove any existing index at that location, and start from scratch. You probably want IndexWriterConfig.OpenMode.CREATE_OR_APPEND , which will open an existing index if one is found.


One minor note:

You shouldn't use LUCENE_CURRENT (deprecated), use a real version instead. You are also using LUCENE_4_0 in your QueryParser. Neither of these will probably cause any major problems, but good to be consistent anyway.

When we use RAMDirectory it loads whole index or large parts of it into “memory” that is virtual memory. As physical memory is limited, the operating system may, of course, decide to swap out our large RAMDirectory . So RAMDirectory is not a good idea to optimize index loading times.

On the other hand, if we don't use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory , we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again.

To resolve all above issue MMapDirectory uses virtual memory and a kernel feature called “mmap” to access the disk files.

Check this link also.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM