简体   繁体   English

如何从数据库中获取用于在 Graphql 中进行分页的游标?

[英]How to get a cursor for pagination in Graphql from a database?

I am having terrible problems getting a real cursor for resolving a database pagination result in GraphQL.我在获取真正的游标以解决 GraphQL 中的数据库分页结果时遇到了可怕的问题。 No matter what kind of database (SQL eg mysql or NoSQL document eg mongodb) I am using, there is no way, I seem to be able to get a cursor or cursorlike object.无论我使用哪种数据库(SQL 例如 mysql 或 NoSQL 文档例如 mongodb),都没有办法,我似乎能够获得游标或类似游标的对象。

Propably I am missing out on some fundamental concepts but after searching my b... off I am beginning to seriously doubt whether the official GraphQL pagination documentation可能我错过了一些基本概念,但是在搜索了我的 b 之后...关闭我开始严重怀疑官方的 GraphQL 分页文档

https://graphql.org/learn/pagination/ https://graphql.org/learn/pagination/

is based on any real live experience at all.完全基于任何真实的现场体验。

Here's my question: How can I get anything even remotely resembling a cursor from a SQL query like this?这是我的问题:我怎样才能从这样的 SQL 查询中获得任何类似于游标的东西?

SELECT authors.id, authors.last_name, authors.created_at FROM authors
ORDER BY authors.last_name, author.created_at
LIMIT 10
OFFSET 20

I know, offset based pagination should not be used and instead cursor based navigation is considered a remedy.我知道,不应使用基于偏移量的分页,而是将基于光标的导航视为一种补救措施。 And I'd definitely like to cure my application from the offset disease.而且我绝对想治愈我的应用程序中的胶印病。 But in order to do that I need to be able to retrieve a cursor from somewhere .但是为了做到这一点,我需要能够从某处检索游标。

I also understand (forgot where I read that) that primary keys should not be used for pagination either.我也明白(忘了我在哪里读到的)主键也不应该用于分页。

So, I am stuck here.所以,我被困在这里。

I think you're being down-voted for asking a good question.我认为你因为提出了一个好问题而被否决了。 The first/last/before/after concept is difficult to implement in SQL.第一个/最后一个/之前/之后的概念在 SQL 中很难实现。

I've been breaking my head over the same problem.我一直在为同样的问题头疼。 The pagination documentation does not address how to define cursors when you are applying custom ORDER statements.分页文档没有说明在应用自定义 ORDER 语句时如何定义游标。

And I haven't really found a comprehensive solution online either.而且我也没有真正在网上找到全面的解决方案。 I found some posts where people are addressing the issue, but the answers are only partially correct or partially complete (just base64 encode the ID field to make a cursor seems to be the go-to answer, but that says little on what the query actually has to do to compute the cursor).我发现了一些人们正在解决这个问题的帖子,但答案只是部分正确或部分完整(仅对 ID 字段进行 base64 编码以制作光标似乎是首选答案,但这对查询的实际内容几乎没有说明必须做来计算游标)。 Also any solutions involving row_number are quite ugly and not applicable across different SQL dialects.此外,任何涉及row_number 的解决方案都非常丑陋,并且不适用于不同的 SQL 方言。 So let's try it differently.因此,让我们尝试不同的方法。

Quick disclaimer, this is going to be a fairly comprehensive post, but if your back-end uses a decent query builder, you could technically program a method that works for implementing the first/last/before/after pagination required by Relay GraphQL onto ANY pre-existing query.快速免责声明,这将是一篇相当全面的文章,但如果您的后端使用了一个不错的查询构建器,您可以在技术上编写一种方法,用于将 Relay GraphQL 要求的第一个/最后一个/之前/之后的分页实现到ANY 上预先存在的查询。 The only requirement is that the tables you are sorting on all have a column that uniquely represents the default order of the records (usually if your primary key is an integer and is using auto-generated IDs, you can use that one, even though technically ordering a table by its primary key will not always yield the same result as returning the table unordered)唯一的要求是您正在排序的所有表都有一列唯一代表记录的默认顺序(通常,如果您的主键是整数并且使用自动生成的 ID,您可以使用该 ID,即使在技术上按主键对表进行排序并不总是产生与返回无序表相同的结果)

Forget about base64 for a moment and just assume the ID to be a valid cursor field that represents the default order of the table.暂时忘记 base64,只需假设 ID 是表示表默认顺序的有效游标字段。

The answer you find online for using a cursor is usually this.您在网上找到的使用游标的答案通常是这样的。

SELECT * FROM TABLE T
WHERE T.id > $cursorId;

Well, this works great to get all the entries after the cursor, AS LONG as you don't apply any other sorts to the query.嗯,这非常适合获取光标后的所有条目,只要您不对查询应用任何其他类型。 Once you use a custom sort like in your example, this suggestion breaks down.一旦您使用示例中的自定义排序,此建议就会失效。

However the core logic in there can be re-applied for queries with sorts, but the solution needs to be broadened.然而,其中的核心逻辑可以重新应用于带有排序的查询,但解决方案需要扩展。 Let's try to come up with the complete algorithm.让我们尝试提出完整的算法。


Algorithm for first n after c (first n nodes after cursor) c 之后的前 n 个算法(光标后的前 n 个节点)

A node or edge is the same as a row in SQL terminology.节点或边与 SQL 术语中的行相同。 (if 1 row represents a single entity, such as 1 author) (如果 1 行代表单个实体,例如 1 个作者)

While the cursor is the row after which we will start returning sibling rows, be it forwards or backwards.虽然游标是我们将开始返回兄弟行的行,无论是向前还是向后。

Given C is the cursor给定C是光标

A is any other row being compared to C . A是与C进行比较的任何其他行。

T is the table of which both A and C are rows. TAC都是行的表。

And vwxyz are 5 columns on table T , naturally both A and C have these columns. vwxyz是表T上的 5 列,自然AC都有这些列。

The algorithm has to decide whether A is included or excluded from the return query based on the cursor object, given n, and the provided orders of these 5 columns.该算法必须根据给定 n 的游标对象以及提供的这 5 列的顺序来决定 A 是包含在返回查询中还是从返回查询中排除。

Let's start with a single order.让我们从一个订单开始。

Given there is 1 order (v) : (which there should always be at the very least, if we assume our table to be ordered by its primary key by default) To show the first n records , we will need to apply a limit of n , that is trivial.鉴于有 1 个订单(v) :(至少应该始终存在,如果我们假设我们的表默认按其主键排序)要显示前 n 条记录,我们需要应用限制n ,这是微不足道的。 The difficult part is after c .困难的部分是在 c 之后

For a table which is only being ordered by 1 field that would come down to:对于仅按 1 个字段排序的表,该表将归结为:

 SELECT A FROM T
 WHERE A.v > C.v
 ORDER BY T.v ASC
 LIMIT n

This should show all rows which have a bigger v than C, and remove all rows who's v is smaller than that of C, meaning there will not be any rows left before C. If we assume the primary key correctly represents the natural order, we can drop the ORDER BY statement.这应该显示所有 v 大于 C 的行,并删除所有 v 小于 C 的行,这意味着在 C 之前不会有任何行。如果我们假设主键正确表示自然顺序,我们可以删除 ORDER BY 语句。 Then a slightly more readable version of this query would become:然后这个查询的可读性稍强的版本将变为:

 SELECT A FROM T
 WHERE A.id > $cursorIdGivenByClient
 LIMIT n

And there, we've arrived at the simplest solution for providing a cursor to an 'unsorted' table.在那里,我们已经找到了为“未排序”表提供游标的最简单的解决方案。 Which is the same solutation as the commonly accepted answer for dealing with cursors, but incomplete alas.这是与处理游标的普遍接受的答案相同的解决方案,但不完整。

Now let's look at a query that is sorted by two columns ( v and w ):现在让我们看一个按两列( vw )排序的查询:

 SELECT A FROM T
 WHERE A.v > C.v
 OR (A.v = C.v AND A.w > C.w)
 ORDER BY T.v ASC, T.w ASC
 LIMIT n

We start off with the same WHERE Av > Cv , any row for which value v (Av) is less than value of C for the first sort (Cv) is removed from the output result.我们从相同的WHERE Av > Cv ,从输出结果中删除值 v (Av) 小于第一次排序 (Cv) 的 C 值的任何行。 However if the columns for the first order v have the same value for both A and C, Av = Cv we need to look at the second order column to see if A is still allowed to be shown in the query result.但是,如果第一个订单 v 的列对于 A 和 C 具有相同的值, Av = Cv我们需要查看第二个订单列,看看是否仍然允许 A 显示在查询结果中。 Which will be the case if Aw > Cw如果Aw > Cw就会出现这种情况

Let's move on to a query with 3 sorts:让我们继续进行 3 种查询:

 SELECT A FROM T
 WHERE A.v > C.v
 OR (A.v = C.v AND A.w > C.w)
 OR (A.v = C.v AND A.w = C.w AND A.x > C.x)
 ORDER BY T.v ASC, T.w ASC, T.x ASC
 LIMIT n

This is the same logic as for 2 sorts but a little bit more worked out.这与 2 种的逻辑相同,但解决了更多问题。 If the first column is the same, we need to look at the 2nd column to see who's the biggest one.如果第一列相同,我们需要查看第二列以查看谁最大。 If the second column is ALSO the same, we need to look at the 3rd column.如果第二列也相同,我们需要查看第三列。 It's important to realize that the primary key is always the last sort column in the ORDER BY statement, and the last condition to be compared against.认识到主键始终是 ORDER BY 语句中的最后一个排序列,以及要与之比较的最后一个条件,认识到这一点很重要。 In this case Ax > Cx (or A.id > $cursorId)在这种情况下 Ax > Cx(或 A.id > $cursorId)

Anyway a pattern should start to arise.无论如何,一种模式应该开始出现。 For sorting on 4 columns the query would be like this:要对 4 列进行排序,查询将如下所示:

 SELECT A FROM T
 WHERE A.v > C.v
 OR (A.v = C.v AND A.w > C.w)
 OR (A.v = C.v AND A.w = C.w AND A.x > C.x)
 OR (A.v = C.v AND A.w = C.w AND A.x = C.x AND A.y > C.y)
 ORDER BY T.v ASC, T.w ASC, T.x ASC, T.y ASC
 LIMIT n

And finally for sorting on 5 columns.最后对 5 列进行排序。

 SELECT A FROM T
 WHERE A.v > C.v
 OR (A.v = C.v AND A.w > C.w)
 OR (A.v = C.v AND A.w = C.w AND A.x > C.x)
 OR (A.v = C.v AND A.w = C.w AND A.x = C.x AND A.y > C.y)
 OR (A.v = C.v AND A.w = C.w AND A.x = C.x AND A.y = C.y AND A.z > C.z)
 ORDER BY T.v ASC, T.w ASC, T.x ASC, T.y ASC, T.z ASC
 LIMIT n

That's a scary amount of comparisons.这是一个可怕的比较数量。 For every order added, the number of comparisons required to calculate first n after c grows by the Triangular Number performed on each row.对于添加的每个订单,计算c 之后的第一个 n所需的比较次数随着对每一行执行的三角数而增长。 Luckily we can apply some boolean algebra to condense and optimize this query.幸运的是,我们可以应用一些布尔代数来压缩和优化这个查询。

 SELECT A FROM T
 WHERE (A.v > C.v OR
           (A.v = C.v AND 
              (A.w > C.w OR
                   (A.w = C.w AND
                       (A.x > C.x OR
                           (A.x = C.x AND
                               (A.y > C.y OR
                                    (A.y = C.y AND
                                        (A.z > C.z)))))))))
 ORDER BY T.v ASC, T.w ASC, T.x ASC, T.y ASC, T.z ASC
 LIMIT n

Even after condensing it, the pattern is quite clear.即使是浓缩之后,图案也十分清晰。 Every condition line alters between OR an AND, and every condition line alters between > and = , finally every 2 condition lines we compare the next order column.每个条件行在 OR 和 AND 之间改变,每个条件行在 > 和 = 之间改变,最后每 2 个条件行我们比较下一个订单列。

And this comparison is surprisingly performant as well.这种比较的性能也出人意料。 On average half of all rows will qualify after the first Av > Cv check and stop there.在第一次 Av > Cv 检查后,所有行中平均有一半将符合条件并停止。 And of the other half that do get through, the majority will fail at the second Av = Cv check and stop there.在通过的另一半中,大多数将在第二次 Av = Cv 检查时失败并停止。 So while it may generate big queries, I wouldn't be too worried about performance.因此,虽然它可能会产生大量查询,但我不会太担心性能。

But let's get concrete and use this to give you an answer on how to use a cursor for the example in question:但是,让我们具体一点,并使用它来回答有关如何为所讨论的示例使用游标的答案:

 SELECT authors.id, authors.last_name, authors.created_at FROM authors
 ORDER BY authors.last_name, author.created_at

Is your base query, sorted, but not yet paginated.您的基本查询是否已排序,但尚未分页。

Your server receives a request to show "first 20 authors after author with cursor" After decoding the cursor, we find out that it represents the author with id 15.您的服务器收到一个请求,显示“带有光标的作者之后的前 20 个作者” 解码光标后,我们发现它代表了 id 为 15 的作者。

First we can run a small precursor query to get the necessary information we will need:首先,我们可以运行一个小的前体查询来获取我们需要的必要信息:

 $authorLastName, $authorCreatedAt =
      SELECT authors.last_name, authors.created_at from author where id = 15;

Then we apply the algorithm and substitute the fields:然后我们应用算法并替换字段:

  SELECT a.id, a.last_name, a.created_at FROM authors a
  WHERE (a.last_name > $authorLastName OR
            (a.last_name = $authorLastName AND 
               (a.created_at > $authorCreatedAt OR
                    (a.created_at = $authorCreatedAt AND
                        (a.id > 15)))))
 ORDER BY a.last_name, a.created_at, a.id
 LIMIT 20;

There this query will correctly return the first 20 authors after the author with ID 15 according to the sorts of the query.在那里,此查询将根据查询的种类正确返回 ID 为 15 的作者之后的前 20 个作者。

If you don't like using variables or secondary queries you can use subqueries as well:如果您不喜欢使用变量或辅助查询,您也可以使用子查询:

  SELECT a.id, a.last_name, a.created_at FROM authors a
  WHERE (a.last_name > (select last_name from authors where id 15) OR
            (a.last_name = (select last_name from authors where id 15) AND 
               (a.created_at > (select created_at from authors where id 15)  OR
                    (a.created_at = (select created_at from authors where id 15) AND
                        (a.id > 15)))))
 ORDER BY a.last_name, a.created_at, a.id
 LIMIT 20;

Again this isn't as bad as it seems, the subqueries are not correlated and the results will be cached over row loops, so it won't be particularly bad for performance.同样,这并不像看起来那么糟糕,子查询不相关,结果将缓存在行循环中,因此对性能来说不会特别糟糕。 But the query does become messy, especially when you start using JOINS which will need to be applied in the subqueries as well.但是查询确实变得混乱,尤其是当您开始使用 JOINS 时,它也需要应用于子查询中。

You wouldn't need to explicitly call the ORDER on a.id, but I do it to be consistent with the algorithm.您不需要在 a.id 上显式调用 ORDER,但我这样做是为了与算法保持一致。 It does become very important if you're using DESC instead of ASC.如果您使用 DESC 而不是 ASC,它确实变得非常重要。

So what happens if you use DESC columns instead of ASC?那么如果您使用 DESC 列而不是 ASC 会发生什么? Does the algorithm break?算法会崩溃吗? Well not if you apply a small extra rule.如果你应用一个小的额外规则,那就不是了。 For whichever column is using DESC instead of ASC, you replace the '>' sign with '<' and the algorithm will now work for sorting in both directions.对于使用 DESC 而不是 ASC 的任何列,您将“>”符号替换为“<”,该算法现在可用于双向排序。

JOINS have no impact on this algorithm (thank god), other than the fact that 20 rows from joined tables won't necessarily represent 20 entities (20 authors in this case), but that's a problem that is independent of the whole first/after matter and which you will also have using OFFSET. JOINS 对这个算法没有影响(感谢上帝),除了来自连接表的 20 行不一定代表 20 个实体(在这种情况下是 20 个作者),但这是一个独立于整个 first/after 的问题很重要,您还将使用 OFFSET。

It's also not particularly difficult to handle queries which already have pre-existing WHERE conditions.处理已经具有预先存在的 WHERE 条件的查询也不是特别困难。 You just take all the pre-existing conditions, wrap them between brackets, and combine them with an AND statement to the conditions generated by the algorithm.您只需获取所有预先存在的条件,将它们括在括号中,然后将它们与 AND 语句组合到算法生成的条件中。

There, we've implemented an algorithm that can handle any input query and properly paginate it using first/after.在那里,我们实现了一种算法,可以处理任何输入查询并使用 first/after 对其进行正确分页。 (If there are edge cases I missed, do let me know) (如果有我错过的边缘情况,请告诉我)

And you could stop there but... unfortunately你可以停在那里但是......不幸的是

You still need to handle first n , last n , before c , after c , last n before c , last n after c and first n before c if you want to be compliant with the GraphQL Relay specs and get rid of offset completely :).如果您想符合 GraphQL Relay 规范并完全摆脱偏移,您仍然需要处理第一个 n最后一个 n前 c后 c最后 n 前 c最后 n 后 c前 n 前 c :) .

You can get halfway using the given AFTER-algorithm I just provided.您可以使用我刚刚提供的给定 AFTER 算法进行中途。 But for the other half you will need to use the BEFORE -algorithm.但是对于另一半,您将需要使用BEFORE -算法。 It's very similar to the AFTER algorithm:它与 AFTER 算法非常相似:

 SELECT A FROM T
 WHERE (A.v < C.v OR
           (A.v = C.v AND 
              (A.w < C.w OR
                   (A.w = C.w AND
                       (A.x < C.x OR
                           (A.x = C.x AND
                               (A.y < C.y OR
                                    (A.y = C.y AND
                                        (A.z < C.z)))))))))
 ORDER BY T.v ASC, T.w ASC, T.x ASC, T.y ASC, T.z ASC
 LIMIT n

To get the BEFORE-algorithm, you take the AFTER-algorithm and just switch all '<' operators to '>' operators and vice versa.要获得 BEFORE 算法,您采用 AFTER 算法,只需将所有 '<' 运算符切换为 '>' 运算符,反之亦然。 (So in essence before and after are the same algorithm with BEFORE/AFTER + ASC/DESC deciding which direction the operator will have to point to.) (所以本质上,before 和 after 是相同的算法,BEFORE/AFTER + ASC/DESC 决定操作员必须指向哪个方向。)

For 'first n' you don't need to do anything except apply 'LIMIT n' to the query.对于“first n”,除了将“LIMIT n”应用于查询之外,您无需执行任何操作。

For 'last n' you need to apply 'LIMIT n' and reverse all given ORDERS , switching ASC with DESC and DESC with ASC.对于“last n”,您需要应用“LIMIT n”并反转所有给定的 ORDERS ,将 ASC 切换为 DESC,将 DESC 切换为 ASC。 There is one caveat with the 'last n' , while it will correctly return the last n records, it will do so in reversed order, so you need to manually reverse the returned set again, be it in your database or inside your code. 'last n' 有一个警告,虽然它会正确返回最后 n 条记录,但它会以相反的顺序执行,因此您需要再次手动反转返回的集合,无论是在您的数据库中还是在您的代码中。

There with those rules you can successfully integrate any pagination requests from the Relay GraphQL spec onto any SQL query, using a unique sortable column, often the primary key, as cursor that represents the source of truth for default sorting of the table.通过这些规则,您可以成功地将来自 Relay GraphQL 规范的任何分页请求集成到任何 SQL 查询中,使用唯一的可排序列(通常是主键)作为表示表默认排序的真实来源的游标。

It's quite daunting but I managed to write a plugin for Doctrine DQL builder using those algorithms to implement first/last/before/after pagination methods using a MySQL database.这非常令人生畏,但我设法使用这些算法为 Doctrine DQL 构建器编写了一个插件,以使用 MySQL 数据库实现第一个/最后一个/之前/之后的分页方法。 So it's definitely doable.所以这绝对是可行的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM