简体繁体 English

在关系和非关系数据库之间同步分页

[英]Syncing pagination between relational and non-relational database

原文 2019-12-07 13:16:18 6 1 mysql/ database/ elasticsearch/ pagination/ non-relational-database

I use mysql as my main database and I sync some data to elasticsearch to make use of features like fuzzy search and aggregations.我使用 mysql 作为我的主数据库，并将一些数据同步到 elasticsearch 以利用模糊搜索和聚合等功能。 However, this problem can be applied to and couple of relational and non-relational databases.然而，这个问题可以应用于关系数据库和非关系数据库。

When user searches something, I make query to elastic, get ids (primary keys in mysql) and make another query to mysql database, where I filter by ids that were returned from elastic.当用户搜索某些内容时，我对弹性进行查询，获取 id（mysql 中的主键）并对 mysql 数据库进行另一个查询，在那里我按从弹性返回的 id 进行过滤。 I use this approach as you often need to load some additional data from relational database, and it would be hell to maintain these relations inside document-based elastic (eg load user with comment).我使用这种方法是因为您经常需要从关系数据库加载一些额外的数据，并且在基于文档的弹性中维护这些关系（例如加载带有评论的用户）将是地狱。

Problem is, same filters will not be applied to elastic query and mysql query.问题是，相同的过滤器不会应用于弹性查询和 mysql 查询。 In above example, what if you need to filter comments by some user param - that filter will be applied to mysql query, but not elastic.在上面的例子中，如果你需要通过一些用户参数过滤评论 - 该过滤器将应用于 mysql 查询，但不是弹性的。 If same filters won't be applied, pagination will mismatch - 2nd page in mysql can be 4th in elastic.如果不应用相同的过滤器，分页将不匹配 - mysql 中的第二页可以是弹性的第四页。 If I take all of the ids from elastic (no pagination), I am afraid of a long response time and clusters failing + you can't get more than 10K records from elastic without scroll api.如果我从弹性（无分页）中获取所有 id，我担心响应时间长并且集群失败+如果没有滚动 api，您无法从弹性中获得超过 10K 的记录。

I need a conceptual solution here, not actual query examples.我需要一个概念性的解决方案，而不是实际的查询示例。 Feel free to suggest totaly different approach altogether.随意建议完全不同的方法。 Also, I don't need a perfect pagination match, since mysql will do pagination anyway.另外，我不需要完美的分页匹配，因为无论如何 mysql 都会进行分页。 If elastic needs to get more records, it's fine, I just don't want to couse too heavy load.如果elastic 需要获得更多的记录，那很好，我只是不想因为负载太重。

1 个解决方案

Im afraid there is no general solution for the problem you are explaining .恐怕您所解释的问题没有通用的解决方案。 It varies by your response time expectations;它因您的响应时间期望而异； size of data etc.数据大小等

For example,例如，

If you can ensure that one side of JOIN data will be much lesser - you could change join direction;如果您可以确保 JOIN 数据的一侧会少得多 - 您可以更改连接方向； First do the query on mySQL and then do an id based terms search in ES.首先在 mySQL 上进行查询，然后在 ES 中进行基于 id 的术语搜索。
Consider using database embedded search like postgres depending on how complex your queries are and other features of ES you are leveraging根据您的查询的复杂程度以及您正在利用的 ES 的其他功能，考虑使用像postgres这样的数据库嵌入式搜索