简体繁体中英

How can I stop an active query on a postgres coordinator when the worker node has crashed

原文 2022-12-22 21:14:06 6 1 postgresql/ distributed/ kill/ citus

I have a postgres select query that can not be stopped using the standard pg_cancel/pg_terminate commands. Both commands return true, but do nothing. The query has access share locks on hundreds of tables, making it impossible for our ETL's to create new partitions. Query is listed as active but we believe is simply waiting for a response from the worker node which will never be sent.

The database can not be brought down due to the nature of data ingestion. We do not have the ability to use TCP Spoofing to send a signal to the Coordinator as that has been blocked at the o/s level.

System details: Linux: Red Hat Enterprise Linux Server 7.4 (Maipo) Pg: 10.6 citus: 8.1.1

pg_stat_activity for query in question pgStat_activity

Our team has done the following:

select pg_cancel_backend(30334);
select pg_terminate_backend(30334);

At linux command prompt: kill 30334

Both queries return TRUE, the command line does nothing and the session persists Even if we were to attempt to stop the postgres database, we are afraid the system might wait on the query to finish.

Looking for suggestions that don't involve kill -9. Anyone come across this problem before?

1 answers

We see this happening on systems where TCP keepalive is wrongly configured. The technical details are complicated, but when you run into these symptoms Citus is waiting for a response on the socket to the worker. Due to misconfigured TCP Keepalive, the linux kernel is keeping the socket open even though the remote has already disappeared. TCP Keepalive will cause messages to flow over TCP sockets that have no natural packets moving which will cause the detection of the broken socket and a subsequent close signal being passed to the Citus extension that is waiting for a response.

To answer your question, if you cannot restart the coordinator you will need to manually connect GDB to the backend that is in this state. From the backtrace you can poke around and find the file descriptor that the process is blocked on, you might want to install the debug symbols for both Postgresql and Citus. Calling close ( https://linux.die.net/man/2/close ) on this filedescriptor will help unblock the backend. Alternatively sometimes the FD can be found with strace.

Lastly you will want to enable and configure TCP Keepalive in a way where the kernel does send tcp keepalive messages and closes the socket for you when the other end disappears. Pointers can be found here: https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

How can I get the host name and query that has been executed on the slave postgres database that caused the system into the memory crashed

How can I stop a Postgres script when it encounters an error?

Query against worker nodes omitting the coordinator in Citus

Citus How to promote metadatasynced worker to coordinator if coordinator completely lost

How can I use more than one query in Node-postgres?

How can I optimize this SQL query in Postgres?

How can I optimize this query in Postgres

How can I optimize this Postgres query?

How can i escape the ::JSONB in postgres query?

How I can return the result of query in Postgres?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I get the host name and query that has been executed on the slave postgres database that caused the system into the memory crashed How can I stop a Postgres script when it encounters an error? Query against worker nodes omitting the coordinator in Citus Citus How to promote metadatasynced worker to coordinator if coordinator completely lost How can I use more than one query in Node-postgres? How can I optimize this SQL query in Postgres? How can I optimize this query in Postgres How can I optimize this Postgres query? How can i escape the ::JSONB in postgres query? How I can return the result of query in Postgres?

Related Tags

How can I stop an active query on a postgres coordinator when the worker node has crashed

Question

1 answers

solution1 0 2022-12-27 12:46:35

solution1
0 2022-12-27 12:46:35