简体   繁体   中英

CloudSQL Postgress (11 or 14) very slow performance and TPS on good instance configuration

We are having a few issues with our CloudSQL instances, we have tried multiple configurations from 1vCpu to 4vCpu, up to 24gb RAM, 100GB SSD, we always have around the same results of around 300-500MS to respond a single small request, translating to around 2-3TPS when testing a single connection with pgbench:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 1 number of threads: 1 duration: 100 s number of transactions actually processed: 207 latency average = 483.165 ms initial connection time = 359.565 ms tps = 2.069685 (without initial connection time)

This scales ok with multiple connections as the result shows, but still very slow overall:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 50 query mode: simple number of clients: 50 number of threads: 50 duration: 100 s number of transactions actually processed: 9296 latency average = 538.585 ms initial connection time = 441.047 ms tps = 92.835803 (without initial connection time)

Changing from PG 14 to 11 gained around 15% performance for larger benchmark sets, but still slow for a single connection test.

We haven't touched any of the more advanced things like setting database flags, the instances are pretty much on default.

When running the same test on a local DB (Mac m1 pro) we get much better numbers as expected, but the difference looks too large:

transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 30 s number of transactions actually processed: 4057 number of failed transactions: 0 (0.000%) latency average = 7.393 ms initial connection time = 17.055 ms tps = 135.263939 (without initial connection time)

Those connections are using cloudsql auth proxy latest stable version.

Expect the TPS to be closer to the numbers we see online, at least around 5-10tps per connection.

This is the result from a machine closer to the server in us-central transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 5 number of threads: 1 duration: 30 s number of transactions actually processed: 207 latency average = 690.794 ms initial connection time = 2037.796 ms tps = 7.238044 (without initial connection time)

In your first test, you have a.network latency problem. Note that the default pgbench transaction is not "a single small request". It is seven separate small requests, each needing one round trip in sequence, which together make up one transaction. To demonstrate the effect of these multiple round trips, you could define a function which packages up all 7 tasks (5 not counting the transaction markers) into a UDF, and then make a custom transaction which calls that UDF in autocommit mode. You should find it is about 7 times faster despite doing the same work. Of course there is no point in doing this demo if you believe me that you have a.networking problem. In newer versions of PostgreSQL, you could use the pipelining feature to have multiple of those task in flight at the same time, rather than waiting for one response before sending the next request.

In your closer-to-"us-central" test, you have a concurrency problem as well as a (smaller).network latency problem. In that test, you show the scaling factor as being 1. that means that every transaction is updating the same row in the branch table, and so all concurrent transactions will conflict with each other, and so run partially in single-file. Generally the scaling factor should be at least as large as the number of clients you are testing (5 clients here, but why not use a scaling factor of 50 like you did in the first test?).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM