简体繁体中英

What is the impact of the _source field in ElasticSearch with regards to large documents (extracted PDF Books, documents etc)?

原文 2015-12-11 20:52:43 8 1 performance/ elasticsearch

I have looked at the _source field that get each document ingested into Elasticsearch gets. The _source field seems to be a stored field.

I have a concern with regards to the _source field storing all of the fields of my document that I submitted especially since the body of the document that I submit is a pretty large chunk of text .

Question is

Will having such a large chunk of text as a stored field impact segment merges or affect indexing in a negative way, with the scenario that the rate of document flow also can be quite large?

Is it a better option to have the content of the _source field staged by the feeding process (specifically the body) so that I can perform a re-ingestion on schema changes (which is touted to be the advantage of having that _source field).

1 answers

The _source field is one of those things that differentiate Solr from Elasticsearch. In Elasticsearch, it exists to make the developer's job easier. But it does have the costs and trade-offs. Still, you can turn the field off, and loose those advantages.

On the other hand, Solr is a lot more explicit (verbose) about each step, but you get to choose the trade-offs with more granularity as well.

Under the covers, the indexing and searching is actually remarkably similar, despite very different syntax.

Elasticsearch - many small documents vs fewer large documents?

ElasticSearch retrieves documents slowly

Mongo updating large documents

What are the performance considerations when adding a large number of documents to a large Solr core?

What is the impact of compression on query performance in Elasticsearch?

Performance impact of big field name and date type in Elasticsearch

ElasticSearch poor query performance one 100K documents dataset

How to limit elasticsearch to a list of documents each identified by a unique keyword

MongoDB embedded vs reference schema for large data documents

_id field vs indexed date field to obtain latest documents

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Elasticsearch - many small documents vs fewer large documents? ElasticSearch retrieves documents slowly Mongo updating large documents What are the performance considerations when adding a large number of documents to a large Solr core? What is the impact of compression on query performance in Elasticsearch? Performance impact of big field name and date type in Elasticsearch ElasticSearch poor query performance one 100K documents dataset How to limit elasticsearch to a list of documents each identified by a unique keyword MongoDB embedded vs reference schema for large data documents _id field vs indexed date field to obtain latest documents

Related Tags

What is the impact of the _source field in ElasticSearch with regards to large documents (extracted PDF Books, documents etc)?

Question

1 answers

solution1 0 2015-12-12 02:52:38

solution1
0 2015-12-12 02:52:38