简体繁体 English

选择正确的搜索和索引解决方案

[英]Choosing the Right Solution for Search and Indexing

原文 2020-05-28 12:50:43 4 1 elasticsearch/ search-engine/ lucene.net/ search-engine-api

We are working on headless application design and development.我们正在致力于无头应用程序的设计和开发。 Currently, we are facing a **architectural question** which we need to find the answer to proceed with designing the system, we are not experts in the **search engine** , but we are doing research on this area.目前，我们面临着一个**architectural question** ，我们需要找到答案才能继续设计系统，我们不是**search engine**方面的专家，但我们正在研究这个领域。

Our tech stack is .net Core/SQL Server and in future we may plan to use Raven DB.我们的技术stack is .net Core/SQL Server ，将来我们可能plan to use Raven DB.

Instead of using content delivery API, we plan to use Query based content delivery to make it more flexible and reduce the overheads of API development for each front end framework.我们计划不再使用内容分发 API，而是使用Query based content delivery分发，使其更灵活，并减少 API 开发每个前端框架的开销。 and We decided to use indexing and index for majority of the data management, ie to reduce the DB load.并且我们决定对大部分数据管理使用索引和索引，即减少数据库负载。 So basically most content operations will be handled using the indexes.所以基本上大多数内容操作都将使用索引来处理。

The problem we observed with Search Engine: On the first cut, we planned to use Elastic Search , but again we understood the following issues .我们在搜索引擎中观察到的问题：在第一次剪辑时，我们计划使用Elastic Search ，但我们再次理解了以下issues 。

The system will have a dynamic field management and field data management , ie user will be editing the fields, and field values while the system is running.系统将具有dynamic field management and field data management ，即用户将在系统运行时编辑字段和字段值。 for each time we may need to rebuild the index to update the field in elastic search (We are not experts in search engine), this will increase the.network load which may not be feasible for us to operate in a large multitenant environment.因为每次我们可能需要重建索引来更新弹性搜索中的字段（我们不是搜索引擎专家），这会增加.network 负载，这对于我们在大型多租户环境中运行可能是不可行的。

So we decided to go with Lucene.net , but before proceeding with lucene.net we want to make sure the following things can be solved.所以我们decided to go with Lucene.net ，但在继续使用lucene.net之前，我们要确保可以解决以下问题。

Updating field dynamically without rebuilding indexing each time, does lucene support this or can we customize to manage this?动态更新字段而无需每次重建索引，lucene 是否支持此功能，或者我们可以自定义来管理此功能吗？

The second Issue is managing separate indexes for each tenant with a distributed architecture.第二个问题是使用分布式架构为每个租户管理单独的索引。

We plan to have a partition for each tenant in production so that data will not be in a single index.我们计划为生产中的每个租户创建一个分区，这样数据就不会位于单个索引中。 This is because we don't need to put high load on web server for managing permission-based query results, instead, Lucene will do this.这是因为我们不需要在 web 服务器上施加高负载来管理基于权限的查询结果，而 Lucene 将执行此操作。 so for any query results will be returned based on permission of the users who queried it, so it is better to have separate index for each tenant to reduce the operations.所以对于任何查询结果都将基于查询它的用户的权限返回，所以最好为每个租户有单独的索引以减少操作。

Is it possible to have distributed Lucene implementation by having a partition for each tenant exclusively?是否可以通过为每个租户专门设置一个分区来实现分布式 Lucene 的实现？

So kindly help in finding a solution for above two problems that we facing right now.因此，请帮助找到我们现在面临的上述两个问题的解决方案。

1 个解决方案

Elasticsearch internally uses Lucene only, every elasticsearch index(made up of one or more shards) is internally a Lucene index. Elasticsearch 内部仅使用 Lucene，每个 elasticsearch 索引（由一个或多个分片组成）内部是一个 Lucene 索引。 You can even think of Elasticsearch as a distributed Lucene which can be easily scaled to thousands of physical servers easily.您甚至可以将 Elasticsearch 视为分布式 Lucene ，可以轻松扩展到数千台物理服务器。

Now, this should clear you any doubt as all the low-level operation like updating a document and deleting the document is done by internally Lucene in case of Elasticsearch which is part 1 of your question.现在，这应该消除您的任何疑问，因为所有低级操作（如更新文档和删除文档）都是由内部 Lucene 完成的，如果 Elasticsearch 是您问题的第 1 部分。

Your first question你的第一个问题

Q: Updating field dynamically without rebuilding indexing each time, does Lucene support this or can we customize to manage this? Q：动态更新字段，不用每次都重建索引，Lucene是否支持这个，或者我们可以自定义管理这个吗？

You are just updating a single document, it would not cause the entire index to rebuild and you will get the updated document within 1 sec(default refresh interval ) or you if want updated document immediately you can do an explicit refresh(Not recommended).您只是更新一个文档，它不会导致整个索引重建，您将在 1 秒内（默认刷新间隔）获得更新的文档，或者如果您想要立即更新文档，您可以进行显式刷新（不推荐）。

Coming to your second question:来到你的第二个问题：

Q: Is it possible to have distributed Lucene implementation by having a partition for each tenant exclusively?问：是否可以通过为每个租户独占一个分区来实现分布式 Lucene 的实现？

Answer: As explained you can think of Elasticsearch as a distributed Lucence only and can create a separate index for each of tenant easily and they won't interface with each other data(although if you are storing multiple indices on the same Elasticsearch cluster there will not an infra resource isolation(CPU, memory)) etc and you can get the noisy neighbors issue.回答：如前所述，您可以将 Elasticsearch 视为仅分布式 Lucence，并且可以轻松地为每个租户创建单独的索引，并且它们不会相互连接数据（尽管如果您将多个索引存储在同一个 Elasticsearch 集群上，将会不是基础资源隔离（CPU、内存））等，您可能会遇到嘈杂的邻居问题。