I'm doing a dump of a database to separate CSV files, using the shell script below :
PGENGINE=$PGHOME/bin
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
while read TBL; do
echo "Exporting table "$TBL
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -c "COPY public.$TBL TO STDOUT WITH CSV HEADER DELIMITER '"$SEPARATEUR_CSV"'" $5 > /$ROOT_PATH/$6/$TBL.csv
echo -e $TBL ": Export done\n"
done
This works fine on my test database, but I am concerned about what will happen running it on a production database.
I saw many topics saying that pg_dump acquires a lock on data but I don't know about psql COPY, including the fact that I'm making a loop over all tables. I need to be sure that if a user updates one of my tables, the COPY command will still get the right data and the right FKs.
My questions:
Do you think it is a proper way to do it ? Is a stored procedure safer for data consistency?
What would be the most efficient way to achieve this? (since this production database is quite large - some tables are over 30 million rows).
A consistent read across tables in a live database is achieved by starting a transaction in the REPEATABLE READ isolation mode and ending it when all is read. Your script must be transformed so that there is only one psql
invocation, looking like this:
psql [connection arguments] << EOF
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
\copy table1 TO file1.csv
\copy table2 TO file2.csv
\copy table3 TO file3.csv
COMMIT;
EOF
Note \\copy
instead of COPY
, as a consequence of having all grouped in the same psql invocation. psql
itself routes each \\copy
's data to each client-side output file.
Also it's a two-step workflow: first generate the above script (such as by looping in bash over the result of psql -c 'select tablename....'
or any other method), then execute the script.
Why can't it be simplified to one step?
The loop cannot be implemented in the psql script because psql doesn't have loops, except somewhat with \\gexec
, but it's not applicable here because \\copy
is a meta-command, and \\gexec
handles only SQL commands.
The loop also cannot be implemented in plpgsql, unless changing the context of the question, because each output of COPY TO STDOUT
would not be routed to a corresponding per-table client-side file. It would come back to the client as everything concatenated into a single stream. If using the SQL command COPY TO file
it would work but you need to be superuser and the files end up on the server, not on the client.
I finally ended up with this solution :
PGENGINE=$PGHOME/bin
CHEMIN_SCRIPT_TRANSACTION=/$ROOT_PATH/plc/proc/tmp/dump_transaction.sql
DOSSIER_DUMP_FICHIERS=/$ROOT_PATH/dump/dump_$6/dump_fichiers
echo "BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;" > $CHEMIN_SCRIPT_TRANSACTION
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
while read TBL; do
echo "\copy $TBL TO $DOSSIER_DUMP_FICHIERS/$TBL.csv WITH CSV HEADER DELIMITER ';';" >> $CHEMIN_SCRIPT_TRANSACTION
echo "\echo " >> $CHEMIN_SCRIPT_TRANSACTION
done
echo "COMMIT;" >> $CHEMIN_SCRIPT_TRANSACTION
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -d $5 -f $CHEMIN_SCRIPT_TRANSACTION
I'm creating a script in a different file and then I use psql -f to play this script.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.