简体繁体 English

Postgres 中的共享扫描

[英]Shared Scanning in Postgres

原文 2020-06-04 02:23:41 6 1 postgresql/ memory-management/ database-design/ database-performance

In the 11th lecture of the CMU Intro to Databases course (2020, 39:37), Andy Pavlo states that "only the high end data systems support shared buffer scanning but Postgres and MySql cannot".在 CMU Intro to Databases 课程第 11讲（2020 年 39:37）中，Andy Pavlo 表示“只有高端数据系统支持共享缓冲区扫描，但 Postgres 和 MySql 不支持”。 He does not expand and thus, I tried to find out why but couldn't find any abstracted information and wanted to ask here before I dove into the documentation.他没有展开，因此，我试图找出原因，但找不到任何抽象信息，并想在我深入文档之前在这里询问。 Did Andy mean that Postgres cannot support this due to its implementation, or has it simply not been implemented yet? Andy 的意思是 Postgres 由于其实施而无法支持这一点，还是只是尚未实施？

If it cannot be implemented, what about the Postgres design prevents it from doing so?如果它无法实现，那么 Postgres 的设计会阻止它这样做吗？ How can this be circumvented?如何避免这种情况？ If it is possible, what is preventing the implementation today?如果可能的话，是什么阻碍了今天的实施？ Thanks in advance.提前致谢。

1 个解决方案

Listening to the talk, he says something like:听着谈话，他说：

If we do a merge join, we have got to sort the tables.如果我们进行合并连接，我们必须对表进行排序。 Now if we detect that two queries want to sort the same data at the same time, it would be cool if the queries could piggy-back onto each other.现在，如果我们检测到两个查询想要同时对相同的数据进行排序，那么如果查询可以相互捎带，那就太酷了。 The high-end systems can do that, but Postgres and MySQL cannot.高端系统可以做到这一点，但 Postgres 和 MySQL 不能。

That's only partly true.这只是部分正确。

It is true that each backend (each query) that wants to sort has to do so on its own, and there is no way of sharing sorted results.确实，每个想要排序的后端（每个查询）都必须自己这样做，并且没有办法共享排序的结果。

But I don't think that would be a very valuable feature:但我认为这不是一个非常有价值的功能：

Any two queries will likely see different versions of the data (imagine a row inserted between the start of the two queries), so they couldn't share the result anyway.任何两个查询都可能会看到不同版本的数据（想象在两个查询的开头插入一行），因此它们无论如何都无法共享结果。 So this could only be used if two queries want to sort the exact same set of rows in exactly the same way at approximately the same time, which seems like too much of a corner case to add a complicated feature for.因此，只有在两个查询想要在大致相同的时间以完全相同的方式对完全相同的一组行进行排序时，才能使用这种方式，这似乎是一个过于极端的情况，无法为其添加复杂的功能。 Sharing data between PostgreSQL backends is difficult because of the multi-process architecture of PostgreSQL.由于 PostgreSQL 的多进程架构，在 PostgreSQL 后端之间共享数据很困难。

But what PostgreSQL can do (and here the speaker is wrong) is to have two queries share a sequential scan of the same table: if you leave synchronize_seqscans at its default value of on , a second query that wants to scan the same table as an already running query will just piggy-back on to the running sequential scan.但是 PostgreSQL可以做的是让两个查询共享同一个表的顺序扫描：如果您将synchronize_seqscans保留为其默认值on ，第二个查询想要扫描同一个表作为已经运行的查询将只是捎带到正在运行的顺序扫描。 That is easier, because the data are in shared_buffers , which is a shared resource.这更容易，因为数据在shared_buffers中，这是一个共享资源。 This feature reduces I/O if you have many concurrent sequential scans of the same table.如果您对同一个表进行多次并发顺序扫描，此功能会减少 I/O。