在Elasticsearch上索引数据的最佳方法是什么？

Question

I have 4 tables: 我有4张桌子：

country 国家
state 州
city 市
address 地址

These tables are related by ids where country is the top parent: 这些表通过ID相关联，其中country是头号父项：

state.countryId state.countryId
city.stateId city.stateId
address.cityId address.cityId

I want to integrate elastic search on my application and want to know what is the best way to index these table? 我想在应用程序上集成弹性搜索，并想知道索引这些表的最佳方法是什么？

Should i create 1 index for each tables so that i have 1 index for each of country, state, city and address? 我应该为每个表创建1个索引，以便为每个国家，州，城市和地址创建1个索引吗？

Or should i denormalize the tables and create only 1 index and store all the data with redundancy? 还是我应该对表进行非规范化并仅创建1个索引并以冗余方式存储所有数据？

Answer 1

ES is not afraid of redundancy in your data, so I would clearly denormalize so that each document represents one address like this: ES并不担心您的数据中的冗余，因此我会明确地进行规范化，以便每个文档都代表一个这样的地址：

{
    "country_id": 1,
    "country_name": "United Stated of America",
    "state_id": 1,
    "state_name": "California"
    "state_code": "CA",
    "city_id": 1,
    "city_name": "San Mateo"
    "zip_code": 94402,
    "address": "400 N El Camino Real"
}

You can then aggregate your data on whatever city, state, country field you wish. 然后，您可以在所需的任何城市，州，国家/地区字段上汇总数据。

Your mileage may vary as it ultimately depends on how you want to query/aggregate your data, but it's much easier to query address data like this in a single index instead of hitting several indices. 您的里程可能会有所不同，因为它最终取决于您要查询/汇总数据的方式，但是像这样在单个索引中查询地址数据而不是命中多个索引要容易得多。

Answer 2

I like Val's answer, it is the most straight forward option. 我喜欢Val的答案，这是最直接的选择。 But if you really want to reduce duplication (for example to minimize size on disk) you could use parent-child mapping. 但是，如果您确实想减少重复（例如，最小化磁盘上的大小），则可以使用父子映射。 It will make indexing and querying a bit more verbose though. 它将使索引和查询更加冗长。 I still sugges to go with "flat" mapping. 我仍然建议使用“平面”映射。

You asked "what if you need the individual country or state or city records?", I'd recommend to add an additional field ( not_analyzed or integer) which would indicate which level of hierarchy this document represents. 您问“是否需要各个国家或地区或城市记录？”，我建议添加一个附加字段（ not_analyzed或integer），以指示此文档代表的层次结构级别。 It is fine not to have fields which correspond to lower levels of hierarchy. 最好不要具有与较低层次结构相对应的字段。 This way you could easily have a filter on just searching states or countries. 这样，您可以轻松地在搜索州或国家/地区时使用过滤器。

Answer 3

Here is a very useful article by @ adrien-grand which elaborates on the subject of the trade-offs between creating many indexes, or less indexes and many types. 这是@ adrien-grand撰写的非常有用的文章，阐述了在创建多个索引或更少的索引与许多类型之间进行权衡的主题。

Hope it helps! 希望能帮助到你！

在Elasticsearch上索引数据的最佳方法是什么？

问题描述

3 个解决方案

解决方案1
2 2016-05-11 11:20:16

解决方案2
0 2016-05-11 20:51:45

解决方案3
0 2016-05-13 14:35:20

在Elasticsearch上索引数据的最佳方法是什么？

问题描述

3 个解决方案

解决方案1 2 2016-05-11 11:20:16

解决方案2 0 2016-05-11 20:51:45

解决方案3 0 2016-05-13 14:35:20

解决方案1
2 2016-05-11 11:20:16

解决方案2
0 2016-05-11 20:51:45

解决方案3
0 2016-05-13 14:35:20