[英]postdock for postgres on dockers for windows - PGPOOL not connecting dbs

I have been trying to implement this and I cannot figure out why this will not work. 我一直在尝试实现这一点,但我无法弄清楚为什么它不起作用。 I have read many people downloading and running as is, but the pgpool never connects to the master or slave. 我读过许多人按原样下载和运行,但是pgpool从未连接到主服务器或从服务器。 I pulled the docker file from paunin's example in issue 57 and changed the image to the current postdock/postgres. 我从问题57的paunin的示例中提取了docker文件,并将映像更改为当前的postdock / postgres。

My docker compose is as follows and I am starting with the following command: 我的泊坞窗组成如下,我从以下命令开始:

docker-compose -f .\\basic.yml up -d

version: '2'
        driver: bridge

        image: postdock/postgres
            PARTNER_NODES: "pgmaster,pgslave1"
            NODE_ID: 1 # Integer number of node
            NODE_NAME: node1 # Node name
            CLUSTER_NODE_NETWORK_NAME: pgmaster
            POSTGRES_PASSWORD: monkey_pass
            POSTGRES_USER: monkey_user
            POSTGRES_DB: monkey_db
            CONFIGS: "listen_addresses:'*'"
            - 5431:5432
                    - pgmaster
        image: postdock/postgres
            PARTNER_NODES: "pgmaster,pgslave1"
            REPLICATION_PRIMARY_HOST: pgmaster
            NODE_ID: 2
            NODE_NAME: node2
            CLUSTER_NODE_NETWORK_NAME: pgslave1
            - 5441:5432
                    - pgslave1

        image: postdock/pgpool
            PCP_USER: pcp_user
            PCP_PASSWORD: pcp_pass
            WAIT_BACKEND_TIMEOUT: 60
            CHECK_USER: monkey_user
            CHECK_PASSWORD: monkey_pass
            DB_USERS: monkey_user:monkey_pass
            BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::"
            CONFIGS: "num_init_children:250,max_pool:4"
            - 5432:5432
            - 9898:9898 # PCP
                    - pgpool

Both the master and the replication db seem to come up fine. 主数据库和复制数据库似乎都正常运行。 I can see both in pgAdmin and I can create a table and see it appear in monkey_db. 我既可以在pgAdmin中看到它们,也可以创建一个表并看到它出现在monkey_db中。 However, it is never moved over to the replica. 但是,它永远不会移到副本。

Here is the log for the master container: 这是主容器的日志:

PS C:\platform\docker\basic> docker logs basic_pgmaster_1
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
No pre-populated ssh keys!
cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
>>> SSH is not enabled!
>>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
>>> Cleaning data folder which might have some garbage...
>>> Check all partner nodes for common upstream node...
>>>>>> Checking NODE=pgmaster...
psql: could not connect to server: Connection refused
        Is the server running on host "pgmaster" ( and accepting
        TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>>>>> Checking NODE=pgslave1...
psql: could not connect to server: Connection refused
        Is the server running on host "pgslave1" ( and accepting
        TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
>>> Recovery is in progress:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2018-09-20 06:03:29.170 UTC [85] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:29.197 UTC [86] LOG:  database system was shut down at 2018-09-20 06:03:28 UTC
2018-09-20 06:03:29.202 UTC [85] LOG:  database system is ready to accept connections
server started


/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'listen_addresses'=''*''
>>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
>>> Creating replication user 'replication_user'
>>> Creating replication db 'replication_db'

waiting for server to shut down...2018-09-20 06:03:30.494 UTC [85] LOG:  received fast shutdown request
.2018-09-20 06:03:30.514 UTC [85] LOG:  aborting any active transactions
2018-09-20 06:03:30.517 UTC [85] LOG:  worker process: logical replication launcher (PID 92) exited with exit code 1
2018-09-20 06:03:30.517 UTC [87] LOG:  shutting down
2018-09-20 06:03:30.542 UTC [85] LOG:  database system is shut down
server stopped

PostgreSQL init process complete; ready for start up.

2018-09-20 06:03:30.608 UTC [47] LOG:  listening on IPv4 address "", port 5432
2018-09-20 06:03:30.608 UTC [47] LOG:  listening on IPv6 address "::", port 5432
2018-09-20 06:03:30.616 UTC [47] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:30.646 UTC [131] LOG:  database system was shut down at 2018-09-20 06:03:30 UTC
2018-09-20 06:03:30.664 UTC [47] LOG:  database system is ready to accept connections
>>>>>> RECOVERY_WAL_ID is empty!
>>> Not in recovery state (anymore)
>>> Waiting for local postgres server start...
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
>>>>>> Schema replication_db.public exists on host pgmaster:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_pg_cluster' schema
INFO: retrieving node list for cluster 'pg_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00;  Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00;  Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'pg_cluster' with id 1 (conninfo: user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in current directory
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in /etc
[2018-09-20 06:03:56] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-09-20 06:03:56] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2'
[2018-09-20 06:03:56] [INFO] connected to database, checking its state
[2018-09-20 06:03:56] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
[2018-09-20 06:03:56] [INFO] checking node 1 in cluster 'pg_cluster'
[2018-09-20 06:03:56] [INFO] reloading configuration file
[2018-09-20 06:03:56] [INFO] configuration has not changed
[2018-09-20 06:03:56] [INFO] starting continuous master connection check


Here is the log for the slave. 这是从站的日志。 It appears that the primary db is cloned successfully: 看来主数据库已成功克隆:

> ```
> >>> Setting up STOP handlers...
> >>> STARTING SSH (if required)...
> No pre-populated ssh keys!
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> >>> SSH is not enabled!
> >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
> >>> Cleaning data folder which might have some garbage...
> >>> Check all partner nodes for common upstream node...
> >>>>>> Checking NODE=pgmaster...
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgmaster" ( and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>>>>> Checking NODE=pgslave1...
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgslave1" ( and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>> Auto-detected master name: ''
> >>> Setting up repmgr...
> >>> Setting up repmgr config file '/etc/repmgr.conf'...
> >>> Setting up upstream node...
> cat: /var/lib/postgresql/data/standby.lock: No such file or directory
> >>> Previously Locked standby upstream node LOCKED_STANDBY=''
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgmaster" ( and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Host pgmaster:5432 is not accessible (will try 30 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 29 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 28 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 27 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Sending in background postgres start...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Starting standby node...
> >>> Instance hasn't been set up yet.
> >>> Clonning primary node...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> NOTICE: destination directory '/var/lib/postgresql/data' provided
> INFO: connecting to upstream node
> INFO: Successfully connected to upstream node. Current installation size is 37 MB
> INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
> >>> Replicated: 4
> NOTICE: starting backup (using pg_basebackup)...
> INFO: executing: '/usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/data -h pgmaster -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_2 '
> NOTICE: standby clone (using pg_basebackup) complete
> NOTICE: you can now start your PostgreSQL server
> HINT: for example : pg_ctl -D /var/lib/postgresql/data start
> HINT: After starting the server, you need to register this standby with "repmgr standby register"
> [REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-09-20 06:04:08.427899+00;  Details: Cloned from host 'pgmaster', port 5432; backup method: pg_basebackup; --force: Y
> >>> Configuring /var/lib/postgresql/data/postgresql.conf
> >>>>>> Will add configs to the exists file
> >>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
> >>> Starting postgres...
> >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
> >>> Recovery is in progress:
> 2018-09-20 06:04:08.517 UTC [163] LOG:  listening on IPv4 address "", port 5432
> 2018-09-20 06:04:08.517 UTC [163] LOG:  listening on IPv6 address "::", port 5432
> 2018-09-20 06:04:08.521 UTC [163] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
> 2018-09-20 06:04:08.549 UTC [171] LOG:  database system was interrupted; last known up at 2018-09-20 06:04:06 UTC
> 2018-09-20 06:04:09.894 UTC [171] LOG:  entering standby mode
> 2018-09-20 06:04:09.903 UTC [171] LOG:  redo starts at 0/2000028
> 2018-09-20 06:04:09.908 UTC [171] LOG:  consistent recovery state reached at 0/20000F8
> 2018-09-20 06:04:09.908 UTC [163] LOG:  database system is ready to accept read only connections
> 2018-09-20 06:04:09.916 UTC [175] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> >>> Cloning is done
> >>>>>> WAL id: 000000010000000000000003
> >>> Not in recovery state (anymore)
> >>> Waiting for local postgres server start...
> >>> Wait schema replication_db.public on pgslave1:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
> >>>>>> Schema replication_db.public exists on host pgslave1:5432!
> >>> Unregister the node if it was done before
> >>> Registering node with role standby
> INFO: connecting to standby database
> INFO: connecting to master database
> INFO: retrieving node list for cluster 'pg_cluster'
> INFO: registering the standby
> [REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-09-20 06:04:38.676889+00;  Details:
> INFO: standby registration complete
> NOTICE: standby node correctly registered for cluster pg_cluster with id 2 (conninfo: user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2)
>  Locking standby (NEW_UPSTREAM_NODE_ID=1)...
> >>> Starting repmgr daemon...
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in current directory
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in /etc
> [2018-09-20 06:04:38] [NOTICE] configuration file found at: /etc/repmgr.conf
> [2018-09-20 06:04:38] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2'
> [2018-09-20 06:04:38] [INFO] connected to database, checking its state
> [2018-09-20 06:04:38] [INFO] connecting to master node of cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] retrieving node list for cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking role of cluster node '1'
> [2018-09-20 06:04:38] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking node 2 in cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] reloading configuration file
> [2018-09-20 06:04:38] [INFO] configuration has not changed
> [2018-09-20 06:04:38] [INFO] starting continuous standby node monitoring


Here is the pgpool log: 这是pgpool日志:

> >>> STARTING SSH (if required)...
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> No pre-populated ssh keys!
> >>> SSH is not enabled!
> >>> Opening access from all hosts by md5 in /usr/local/etc/pool_hba.conf
> >>> Adding user pcp_user for PCP
> >>> Creating a ~/.pcppass file for pcp_user
> >>> Adding users for md5 auth
> >>>>>> Adding user monkey_user
> >>> Adding check user 'monkey_user' for md5 auth
> >>> Adding user 'monkey_user' as check user
> >>> Adding user 'monkey_user' as health-check user
> >>> Adding backends
> >>>>>> Waiting for backend 0 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:03:26 Waiting for host: tcp://pgmaster:5432
> 2018/09/20 06:04:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgmaster:5432]
> >>>>>> Will not add node 0 - it's unreachable!
> >>>>>> Waiting for backend 1 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:04:26 Waiting for host: tcp://pgslave1:5432
> 2018/09/20 06:05:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgslave1:5432]
> >>>>>> Will not add node 1 - it's unreachable!
> >>> Checking if we have enough backends to start
> >>>>>> Will start pgpool REQUIRE_MIN_BACKENDS=0, BACKENDS_COUNT=0
> >>> Configuring /usr/local/etc/pgpool.conf
> >>>>>> Adding config 'num_init_children' with value '250'
> >>>>>> Adding config 'max_pool' with value '4'
> 2018-09-20 06:05:26: pid 62: LOG:  Backend status file /var/log/postgresql/pgpool_status does not exist
> 2018-09-20 06:05:26: pid 62: LOG:  Setting up socket for
> 2018-09-20 06:05:26: pid 62: LOG:  Setting up socket for :::5432
> 2018-09-20 06:05:26: pid 62: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
> 2018-09-20 06:05:26: pid 320: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:05:26: pid 320: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:05:26: pid 320: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:05:26: pid 62: LOG:  child process with pid: 320 exits with status 256
> 2018-09-20 06:05:26: pid 62: LOG:  fork a new child process with pid: 333
> 2018-09-20 06:06:26: pid 319: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:06:26: pid 319: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:06:26: pid 319: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:06:26: pid 62: LOG:  child process with pid: 319 exits with status 256
> 2018-09-20 06:06:26: pid 62: LOG:  fork a new child process with pid: 351
> 2018-09-20 06:07:26: pid 333: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:07:26: pid 333: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:07:26: pid 333: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:07:26: pid 62: LOG:  child process with pid: 333 exits with status 256
> 2018-09-20 06:07:26: pid 62: LOG:  fork a new child process with pid: 370
> 2018-09-20 06:08:26: pid 370: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:08:26: pid 370: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:08:26: pid 370: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:08:26: pid 62: LOG:  child process with pid: 370 exits with status 256
> 2018-09-20 06:08:26: pid 62: LOG:  fork a new child process with pid: 388
> 2018-09-20 06:09:27: pid 302: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:09:27: pid 302: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:09:27: pid 302: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:09:27: pid 62: LOG:  child process with pid: 302 exits with status 256
> 2018-09-20 06:09:27: pid 62: LOG:  fork a new child process with pid: 406
> 2018-09-20 06:10:27: pid 316: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:10:27: pid 316: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:10:27: pid 316: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:10:27: pid 62: LOG:  child process with pid: 316 exits with status 256
> 2018-09-20 06:10:27: pid 62: LOG:  fork a new child process with pid: 424
> 2018-09-20 06:11:27: pid 351: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:11:27: pid 351: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:11:27: pid 351: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:11:27: pid 62: LOG:  child process with pid: 351 exits with status 256
> 2018-09-20 06:11:27: pid 62: LOG:  fork a new child process with pid: 442


I thought this was an issue with WAL shipping, but it appears to clone the db successfully and also registers based on the logs. 我以为这是WAL运送的问题,但它似乎成功克隆了数据库,并且还基于日志进行了注册。 This appears to be something with the PGPOOL and I don't see what I am missing. 这似乎与PGPOOL有关,但我看不到我所缺少的内容。

Any help would be greatly appreciated. 任何帮助将不胜感激。

Thanks. 谢谢。

From czarny94 on the github issues page: 来自github问题页面上的czarny94:

Try to change "createdb" line of /src/pgsql/bin/postgres/primary/entrypoint.sh file. 尝试更改/src/pgsql/bin/postgres/primary/entrypoint.sh文件的“ createdb”行。 Diff from origin/master and mine after changes below: 进行以下更改后,与原产地/原矿和矿山的差异

diff --git a/src/pgsql/bin/postgres/primary/entrypoint.sh b/src/pgsql/bin/postgres/primary/entrypoint.sh
index b8451f5..030cbc7 100755
--- a/src/pgsql/bin/postgres/primary/entrypoint.sh
+++ b/src/pgsql/bin/postgres/primary/entrypoint.sh
@@ -3,11 +3,11 @@ set -e
 FORCE_RECONFIGURE=1 postgres_configure

 echo ">>> Creating replication db '$REPLICATION_DB'"
-echo "host replication $REPLICATION_USER md5" >> $PGDATA/pg_hba.conf
+echo "host replication $REPLICATION_USER trust" >> $PGDATA/pg_hba.conf

