简体   繁体   中英

What is the impact of the _source field in ElasticSearch with regards to large documents (extracted PDF Books, documents etc)?

I have looked at the _source field that get each document ingested into Elasticsearch gets. The _source field seems to be a stored field.

I have a concern with regards to the _source field storing all of the fields of my document that I submitted especially since the body of the document that I submit is a pretty large chunk of text .

Question is

Will having such a large chunk of text as a stored field impact segment merges or affect indexing in a negative way, with the scenario that the rate of document flow also can be quite large?

Is it a better option to have the content of the _source field staged by the feeding process (specifically the body) so that I can perform a re-ingestion on schema changes (which is touted to be the advantage of having that _source field).

The _source field is one of those things that differentiate Solr from Elasticsearch. In Elasticsearch, it exists to make the developer's job easier. But it does have the costs and trade-offs. Still, you can turn the field off, and loose those advantages.

On the other hand, Solr is a lot more explicit (verbose) about each step, but you get to choose the trade-offs with more granularity as well.

Under the covers, the indexing and searching is actually remarkably similar, despite very different syntax.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM