简体   繁体   English

在Elasticsearch上索引数据的最佳方法是什么?

[英]What is the best way to index data on elasticsearch?

I have 4 tables: 我有4张桌子:

  1. country 国家
  2. state
  3. city
  4. address 地址

These tables are related by ids where country is the top parent: 这些表通过ID相关联,其中country是头号父项:

  • state.countryId state.countryId
  • city.stateId city.stateId
  • address.cityId address.cityId

I want to integrate elastic search on my application and want to know what is the best way to index these table? 我想在应用程序上集成弹性搜索,并想知道索引这些表的最佳方法是什么?

Should i create 1 index for each tables so that i have 1 index for each of country, state, city and address? 我应该为每个表创建1个索引,以便为每个国家,州,城市和地址创建1个索引吗?

Or should i denormalize the tables and create only 1 index and store all the data with redundancy? 还是我应该对表进行非规范化并仅创建1个索引并以冗余方式存储所有数据?

ES is not afraid of redundancy in your data, so I would clearly denormalize so that each document represents one address like this: ES并不担心您的数据中的冗余,因此我会明确地进行规范化,以便每个文档都代表一个这样的地址:

{
    "country_id": 1,
    "country_name": "United Stated of America",
    "state_id": 1,
    "state_name": "California"
    "state_code": "CA",
    "city_id": 1,
    "city_name": "San Mateo"
    "zip_code": 94402,
    "address": "400 N El Camino Real"
}

You can then aggregate your data on whatever city, state, country field you wish. 然后,您可以在所需的任何城市,州,国家/地区字段上汇总数据。

Your mileage may vary as it ultimately depends on how you want to query/aggregate your data, but it's much easier to query address data like this in a single index instead of hitting several indices. 您的里程可能会有所不同,因为它最终取决于您要查询/汇总数据的方式,但是像这样在单个索引中查询地址数据而不是命中多个索引要容易得多。

I like Val's answer, it is the most straight forward option. 我喜欢Val的答案,这是最直接的选择。 But if you really want to reduce duplication (for example to minimize size on disk) you could use parent-child mapping. 但是,如果您确实想减少重复(例如,最小化磁盘上的大小),则可以使用父子映射。 It will make indexing and querying a bit more verbose though. 它将使索引和查询更加冗长。 I still sugges to go with "flat" mapping. 我仍然建议使用“平面”映射。

You asked "what if you need the individual country or state or city records?", I'd recommend to add an additional field ( not_analyzed or integer) which would indicate which level of hierarchy this document represents. 您问“是否需要各个国家或地区或城市记录?”,我建议添加一个附加字段( not_analyzed或integer),以指示此文档代表的层次结构级别。 It is fine not to have fields which correspond to lower levels of hierarchy. 最好不要具有与较低层次结构相对应的字段。 This way you could easily have a filter on just searching states or countries. 这样,您可以轻松地在搜索州或国家/地区时使用过滤器。

Here is a very useful article by @ adrien-grand which elaborates on the subject of the trade-offs between creating many indexes, or less indexes and many types. 是@ adrien-grand撰写的非常有用的文章,阐述了在创建多个索引或更少的索引与许多类型之间进行权衡的主题。

Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在ElasticSearch上索引聚合数据的最佳方法是什么 - What is the best way to index aggregate data on ElasticSearch 什么是在laravel中向elasticsearch添加数据的最佳方法 - what is best way to add data to elasticsearch in laravel 在Elasticsearch中创建我的数据子集的最佳方法是什么? - What is the best way to create a subset of my data in Elasticsearch? 在Elastic Search上索引Couchbase数据的最佳方法是什么? - What is the best way to index Couchbase data on Elastic Search 增加 ElasticSearch 集群磁盘空间的最佳方法是什么? - what is the best way to increase diskpace for ElasticSearch cluster ? 在Elasticsearch中查询此字段的最佳方法是什么 - What is the best way of querying this field in Elasticsearch 将 docker 通知发送到 Elasticsearch 的最佳方式是什么? - What is the best way to send docker notifications to Elasticsearch? 在ElasticSearch中管理关系的最佳方法是什么? - What is the best way to manage relations in ElasticSearch? 在Django Restframework中使用elasticsearch的最佳方法是什么 - what is the best way to use elasticsearch in Django Restframework 压缩 Elasticsearch 快照的最佳方法是什么? - What is the best way to compress Elasticsearch snapshot?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM