简体繁体 English

CloudSQL Postgress（11 或 14）在良好的实例配置下性能和 TPS 非常低

[英]CloudSQL Postgress (11 or 14) very slow performance and TPS on good instance configuration

原文 2023-01-12 20:56:04 5 1 postgresql/ performance/ tps

We are having a few issues with our CloudSQL instances, we have tried multiple configurations from 1vCpu to 4vCpu, up to 24gb RAM, 100GB SSD, we always have around the same results of around 300-500MS to respond a single small request, translating to around 2-3TPS when testing a single connection with pgbench:我们的 CloudSQL 实例存在一些问题，我们尝试了从 1vCpu 到 4vCpu 的多种配置，高达 24GB RAM，100GB SSD，我们总是有大约 300-500MS 的相同结果来响应单个小请求，转换为使用 pgbench 测试单个连接时大约 2-3TPS：

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 1 number of threads: 1 duration: 100 s number of transactions actually processed: 207 latency average = 483.165 ms initial connection time = 359.565 ms tps = 2.069685 (without initial connection time)

This scales ok with multiple connections as the result shows, but still very slow overall:结果显示，这可以通过多个连接进行扩展，但总体上仍然非常慢：

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 50 duration: 100 s number of transactions actually processed: 9296 latency average = 538.585 ms initial connection time = 441.047 ms tps = 92.835803 (without initial connection time)

Changing from PG 14 to 11 gained around 15% performance for larger benchmark sets, but still slow for a single connection test.从 PG 14 更改为 11 对于更大的基准测试集获得了大约 15% 的性能，但对于单个连接测试仍然很慢。

We haven't touched any of the more advanced things like setting database flags, the instances are pretty much on default.我们还没有触及任何更高级的东西，比如设置数据库标志，这些实例几乎都是默认的。

When running the same test on a local DB (Mac m1 pro) we get much better numbers as expected, but the difference looks too large:在本地数据库 (Mac m1 pro) 上运行相同的测试时，我们得到了比预期更好的数字，但差异看起来太大了：

transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 30 s number of transactions actually processed: 4057 number of failed transactions: 0 (0.000%) latency average = 7.393 ms initial connection time = 17.055 ms tps = 135.263939 (without initial connection time)

Those connections are using cloudsql auth proxy latest stable version.这些连接使用的是 cloudsql auth proxy 最新的稳定版本。

Expect the TPS to be closer to the numbers we see online, at least around 5-10tps per connection.期望 TPS 更接近我们在网上看到的数字，每个连接至少大约 5-10tps。

This is the result from a machine closer to the server in us-central transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 5 number of threads: 1 duration: 30 s number of transactions actually processed: 207 latency average = 690.794 ms initial connection time = 2037.796 ms tps = 7.238044 (without initial connection time)这是在 us-central transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 5 number of threads: 1 duration: 30 s number of transactions actually processed: 207 latency average = 690.794 ms initial connection time = 2037.796 ms tps = 7.238044 (without initial connection time)

1 个解决方案

In your first test, you have a.network latency problem.在您的第一个测试中，您遇到了网络延迟问题。 Note that the default pgbench transaction is not "a single small request".请注意，默认的 pgbench 事务不是“单个小请求”。 It is seven separate small requests, each needing one round trip in sequence, which together make up one transaction.它是七个独立的小请求，每个请求依次需要一个往返，它们一起构成一个事务。 To demonstrate the effect of these multiple round trips, you could define a function which packages up all 7 tasks (5 not counting the transaction markers) into a UDF, and then make a custom transaction which calls that UDF in autocommit mode.为了演示这些多次往返的效果，您可以定义一个 function，它将所有 7 个任务（5 个不包括事务标记）打包到一个 UDF 中，然后创建一个在自动提交模式下调用该 UDF 的自定义事务。 You should find it is about 7 times faster despite doing the same work.尽管做同样的工作，你应该会发现它快了大约 7 倍。 Of course there is no point in doing this demo if you believe me that you have a.networking problem.当然，如果你相信我你有网络问题，那么做这个演示是没有意义的。 In newer versions of PostgreSQL, you could use the pipelining feature to have multiple of those task in flight at the same time, rather than waiting for one response before sending the next request.在 PostgreSQL 的较新版本中，您可以使用流水线功能让多个任务同时运行，而不是在发送下一个请求之前等待一个响应。

In your closer-to-"us-central" test, you have a concurrency problem as well as a (smaller).network latency problem.在你更接近“us-central”的测试中，你有一个并发问题以及一个（更小的）网络延迟问题。 In that test, you show the scaling factor as being 1. that means that every transaction is updating the same row in the branch table, and so all concurrent transactions will conflict with each other, and so run partially in single-file.在该测试中，您将比例因子显示为 1。这意味着每个事务都在更新分支表中的同一行，因此所有并发事务将相互冲突，因此部分在单个文件中运行。 Generally the scaling factor should be at least as large as the number of clients you are testing (5 clients here, but why not use a scaling factor of 50 like you did in the first test?).一般来说，比例因子应该至少与您正在测试的客户端数量一样大（这里有 5 个客户端，但为什么不像您在第一次测试中那样使用 50 的比例因子？）。