简体繁体中英

Elasticsearch - many small documents vs fewer large documents?

原文 2015-05-14 07:43:35 4 1 performance/ hash/ elasticsearch

I'm creating a search by image system(similar to Google's reverse image search) for a cataloging system used internally at my co.. We've already been using Elasticsearch with success for our regular search functionality, so I'm planning on hashing all our images, creating a separate index for them, and using it for searching. There are many items in the system and each item may have multiple images associated with it, and the item should be able to be find-able by reverse image searching any of its related images.

There are 2 possible schema we've thought of:

Making a document for each image, containing only the hash of the image and the item id it is related to. This would result in about ~7m documents, but they would be small since they only contain a single hash and an ID.

Making a document for each item, and storing the hashes of all the images associated with it in an array on the document. This would result in around ~100k documents, but each document would be fairly large, some items have hundreds of images associated with them.

Which of these schema would be more performant?

1 answers

Having attended a recent Under the Hood talk by Alexander Reelsen, he would probably say "it depends" and "benchmark it".

As @Science_Fiction already hinted:

are the images frequently updated? That could come at a negative cost factor
OTOH, the overhead for 7m documents maybe shouldn't be neglected whereas in your second scenario they would just be not_analyzed terms in a field.

If 1. is a low factor I would probably start with your second approach first.

What is the impact of the _source field in ElasticSearch with regards to large documents (extracted PDF Books, documents etc)?

ElasticSearch retrieves documents slowly

MongoDB embedded vs reference schema for large data documents

Mongo updating large documents

SSD - Single large disk read vs many small disk reads

canvas Performance of drawImage: many small images vs few large

Solr Performance for many documents query

Redis performance: Many queries returning small objects vs one query returning large objects

ElasticSearch poor query performance one 100K documents dataset

How to limit elasticsearch to a list of documents each identified by a unique keyword

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the impact of the _source field in ElasticSearch with regards to large documents (extracted PDF Books, documents etc)? ElasticSearch retrieves documents slowly MongoDB embedded vs reference schema for large data documents Mongo updating large documents SSD - Single large disk read vs many small disk reads canvas Performance of drawImage: many small images vs few large Solr Performance for many documents query Redis performance: Many queries returning small objects vs one query returning large objects ElasticSearch poor query performance one 100K documents dataset How to limit elasticsearch to a list of documents each identified by a unique keyword

Related Tags

Elasticsearch - many small documents vs fewer large documents?

Question

1 answers

solution1 0 2015-05-15 23:06:14

solution1
0 2015-05-15 23:06:14