简体繁体中英

Solr schema design and performance

原文 2014-09-27 21:31:40 7 1 solr

I have books database that has three entities: Books, pages and titles (titles found in a page). I have got confused and concerned about performance between two approaches in the schema design:

1- Dealing with books as documents ie book field, pages field with multiValue and titles field with multiValue too. In this approach all of the book data will be represented in one Solr document with very large fields.

2- dealing with pages as documents which will lead in much smaller fields but larger number of documents.

I tried to look at this official resource but I could not able to find a clear answer for my question.

1 answers

Assuming you are going to take Solr results and present them through another application, I would make the smallest item - Titles - the model for documents, which will make it much easier to present where a result appears. Doing it this way minimizes the amount of application code you need to write. If your users are querying Solr directly I might use Page as a my document instead - presumably you are using Solr's highlighting feature then to assist your users with identifying how their search term(s) matched.

For Title documents I would model the schema as follows:

Book ID + Page Number + Title [string - unique key]
Book ID [integer]
Book Name [tokenized text field]
Page Number [TrieIntField]
Title [tokenized text field]
Content for that book/title/page combination [tokenized text field]

There may be other attributes you want to capture, such as author, publication date, publisher, but you do not explain above what other information you have so I leave that out of this example.

Textual queries then can involve Book Name , Title and Content where you may want to define a single field that's indexed, but not stored, that serves as a target for <copyField/> declarations in your schema.xml to allow for easy searching over all three at the same time.

For indexing, without knowing more about the data being indexed, I would use the ICU Tokenizer and Snowball Porter Stemming Filter with a language specification on the text fields to handle non-English data - assuming all the books are in the same language. And if English, the Standard Tokenizer instead of ICU.

Design optimal Solr Schema

Solr Schema Design

SOLR schema design and searching

solr schema design for 100 over tables

solr schema design for many to many entity definitions

Solr schema design: fitting time-series data

Apply solr 4 schema to solr 6

SOLR performance

Solr managed-schema 'for best index size and searching performance, set “index” to false'. Why?

Solr schema for range values

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Design optimal Solr Schema Solr Schema Design SOLR schema design and searching solr schema design for 100 over tables solr schema design for many to many entity definitions Solr schema design: fitting time-series data Apply solr 4 schema to solr 6 SOLR performance Solr managed-schema 'for best index size and searching performance, set “index” to false'. Why? Solr schema for range values

Related Tags

Solr schema design and performance

Question

1 answers

solution1 1 ACCPTED 2014-09-29 11:29:38

solution1
1 ACCPTED 2014-09-29 11:29:38