简体   繁体   中英

Is tracking the query PID, from the client, possible in Postgres?

We have a client side application that allows very complex reporting on large data sets. On the left hand side is a tree of possible columns to include in the report, and on the right hand side is a table where we dynamically populate the results.

When a user clicks on the columns they want to include in their report, we're building the necessary SQL (often including a lot of joins and complex sub-queries), and firing those queries off asynchronously.

Problem: Users are creating complex reports resulting in very complex and costly queries. These queries end up backing up and causing us to run out of connections.

I ended up finding sections of our logs that look like this:

169077:2019-09-11 22:14:29 UTC LOG:  duration: 65018.497 ms  execute <unnamed>:      
169105:2019-09-11 22:14:31 UTC LOG:  duration: 22494.712 ms  execute <unnamed>: SELEC
169129:2019-09-11 22:14:34 UTC LOG:  duration: 67866.947 ms  execute <unnamed>:      
169157:2019-09-11 22:14:40 UTC LOG:  duration: 51176.815 ms  execute <unnamed>:      
169185:2019-09-11 22:14:41 UTC LOG:  duration: 51836.988 ms  execute <unnamed>:      
169213:2019-09-11 22:14:42 UTC LOG:  duration: 52655.482 ms  execute <unnamed>:      
169244:2019-09-11 22:14:46 UTC LOG:  duration: 55871.561 ms  execute <unnamed>:    

Ouch! You can see by the timestamps this is a user adding more columns they want to report on (I confirmed by looking at the queries), blissfully unaware of the pain our database is going through.

Here are some solutions I thought of:

1) Remove the asynchronous querying, making the end user build the report first, then clicking a button to actually run it. This is not ideal as our current user base (which is quite large) would definitely be confused if we made this change (it's unfortunately a UX digression).

2) When the end user clicks a column and the query is fired off asynchronously, somehow trace the PID of the query that is actually running within Postgres. When the same user clicks another column, kill the previous PID, and start tracking the new PID. This would ensure only one query is running for this end user during the report building process at any given time, and would prevent the long running query buildup as seen in the example above.

Is #2 even possible? I looked at possible tracing with probes, and looked at PGBouncer briefly, but I'm not too familiar with either and wasn't able to find a definitive answer.

Any thoughts or suggestions are welcome!

Here is a generic approach to implementing #2 on the client side.

It looks like you are after the pg_backend_pid function.

Call it after establishing the connection, remember your PID , then start your async query using that connection and PID .

If it turns out that you need to stop the query later, use pg_cancel_backend with the PID remembered previously to cancel it.


Or, you can use functions specific to the library that you use to connect to Postgres.

For example, if you use libpq, you can use PQcancel function to stop the query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM