简体   繁体   English

为什么将数字数据映射到关键字会提高 ElasticSearch 中的检索时间

[英]Why does mapping numeric data to a keyword improve retrieval times in ElasticSearch

I'm coming from a long-term SQL background -- NoSQL (and ElasticSearch) is very new to me.我来自长期的 SQL 背景——NoSQL(和 ElasticSearch)对我来说非常新。

An engineer on my team is constructing a new index for document storage, and they have mapped all short/int/long values to strings for use in term queries.我团队中的一位工程师正在为文档存储构建一个新索引,他们已将所有短/整数/长值映射到字符串以用于术语查询。

This surprised me, as a SQL index with an SmallInt/Int/BigInt key will perform much better than that same set of values turned into a VarChar(X) and indexed accordingly.这让我感到惊讶,因为带有 SmallInt/Int/BigInt 键的 SQL 索引的性能将比将同一组值转换为 VarChar(X) 并相应地进行索引要好得多。

I was pointed to this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html我被指出这篇文章: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

Which has this comment:其中有这样的评论:

Consider mapping a numeric identifier as a keyword if:如果出现以下情况,请考虑将数字标识符映射为关键字:

  • You don't plan to search for the identifier data using range queries.您不打算使用范围查询来搜索标识符数据。
  • Fast retrieval is important.快速检索很重要。 term query searches on keyword fields are often faster than term searches on numeric fields.关键字字段的术语查询搜索通常比数字字段的术语搜索更快。

I'm happy take this at face value, but I don't understand why this is.我很高兴从表面上看这个,但我不明白为什么会这样。

Assuming an exact match type query (eg ID = 100), can anyone speak to the mechanics of ElasticSearch (or NoSQL in general), that would explain why a query against a stringified numeric value is faster than a query against numeric values directly?假设一个完全匹配类型的查询(例如 ID = 100),任何人都可以谈谈 ElasticSearch(或一般的 NoSQL)的机制,这可以解释为什么对字符串化数值的查询比直接对数值的查询更快?

Basically, keywords are stored in the inverted index and the lookup is really fast, which makes keyword the ideal type for term/s queries (ie exact match)基本上,关键字存储在倒排索引中,查找速度非常快,这使得keyword成为term/s查询的理想类型(即完全匹配)

Numeric values, however, are stored in BKD trees (since ES 5/Lucene 6) which are more optimal than the inverted index for numeric values and also optimized for range -like queries.然而,数值存储在BKD 树中(从 ES 5/Lucene 6 开始),它比数值的倒排索引优化,也针对类似range的查询进行了优化。

The downside is that searching for an exact numerical value within a BKD tree is less performant than looking up the term in the inverted index.缺点是在 BKD 树中搜索精确数值的性能不如在倒排索引中查找术语。

So the take away from this is that if your IDs are numeric and you plan on querying them in ranges, map them with a numeric type like integer , etc. But, if you plan on matching your ID in a term/exact-like fashion, then store them as string with a keyword type.因此,从中得出的结论是,如果您的 ID 是数字并且您计划在范围内查询它们,则 map 使用数字类型integer等。但是,如果您计划以类似术语/精确的方式匹配您的 ID ,然后将它们存储为带有keyword类型的字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM