简体   繁体   中英

Apache Lucene - Optimizing Searching

I am developing a web application in Java (using Spring) that uses a SQL Server database. I use Apache Lucene to implement a search feature for my web application. With Apache Lucene, before I perform a search I create an index of titles. I do this by first obtaining a list of all titles from the database. Then I loop through the list of titles and add each one of them to the index. This happens every time a user searches for something.

I would like to know if there is a better, more efficient way of creating the index? I know my way is very inefficient, and will take a long time to complete when the list of titles is very long.

Any suggestions would be highly appreciated.

Thanks

You should:

  1. make Lucene index before you start application
  2. update index when you add/remove/update title in your database

Benefits of this approach:

  1. One full index when application is offline
  2. incremental indexing, each time relevant information is changed

Before you optimize Lucene: SQL Server already has a full-text search feature. If this covers your use case then use it. It's the easiest way since SQL Server takes care of keeping the search index in sync with the database.

If the SQL Server full-text search does not fit your use case then your application has to create its own search index and keep it in sync with the database. To do this you should:

  • create / update the search index when your application starts
  • update the search index when the application inserts, updates or deletes a title

Lucene is flexible where it stores the search index. You can store it in a directory in your file system or in the database (or write you own storage provider). I recommend to store it in the file system as the performance is much better than when you store it in the database.

If you don't have too many titles to index you could also use an in-memory search index which you recreate every time your application starts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM