简体   繁体   English

在Lucene中索引数据库记录

[英]Indexing database records in lucene

I wish to index data across few databases of our application in the lucene. 我希望在Lucene中跨我们的应用程序的几个数据库建立索引。 how to structure the index? 如何建立索引? index per table such that the columns are the fields and data are the values? 每个表的索引,以使列为字段,数据为值? or index per database, the variable table columns with diff fields of lucene? 或每个数据库的索引,具有lucene的diff字段的变量表列? if no then how to structure the index so that search and maintenance will not be complicated? 如果没有,那么如何构造索引,以使搜索和维护不会变得复杂? assuming 100 tables per databases and 10K rows per table. 假设每个数据库100个表,每个表1万行。

It completely depends on the underlying data, and how you want to query it, and without knowing this it is impossible to provide a definitive answer. 它完全取决于基础数据以及查询方式,如果不知道这一点,就不可能提供确定的答案。

If your database schema is normalised you my want to denormalise it somewhat to create a record, consisting of table data from more than one table, per document. 如果您的数据库模式已规范化,则我想对其进行某种程度的非规范化以创建一条记录,该记录由每个文档中来自多个表的表数据组成。

Another factor determining the fields you assign to the document will be how you want to query the data. 确定分配给文档的字段的另一个因素是查询数据的方式。

For example, given the following normalised schema: 例如,给定以下标准化的架构:

TABLE:AUTHOR        COLS:AUTHOR_ID,NAME
TABLE:BOOKS         COLS:BOOK_ID,TITLE,CONTENT
TABLE:AUTHOR_BOOKS  COLS:AUTHOR_ID,BOOK_ID

You could index a single document per author/book: 您可以为每位作者/每本书索引一个文档:

Document (field1:author, field2:title, field3:content)

This will allow you to search for book matches by either author, title or content. 这样您就可以按作者,书名或内容搜索书籍匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM