简体   繁体   中英

PSQL COPY tables to CSV - Data consistency on production database

I'm doing a dump of a database to separate CSV files, using the shell script below :

PGENGINE=$PGHOME/bin
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
while read TBL; do
    echo "Exporting table "$TBL
    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -c "COPY public.$TBL TO STDOUT WITH CSV HEADER DELIMITER '"$SEPARATEUR_CSV"'" $5 > /$ROOT_PATH/$6/$TBL.csv
    echo -e $TBL ": Export done\n"
done

This works fine on my test database, but I am concerned about what will happen running it on a production database.

I saw many topics saying that pg_dump acquires a lock on data but I don't know about psql COPY, including the fact that I'm making a loop over all tables. I need to be sure that if a user updates one of my tables, the COPY command will still get the right data and the right FKs.

My questions:

  1. Do you think it is a proper way to do it ? Is a stored procedure safer for data consistency?

  2. What would be the most efficient way to achieve this? (since this production database is quite large - some tables are over 30 million rows).

A consistent read across tables in a live database is achieved by starting a transaction in the REPEATABLE READ isolation mode and ending it when all is read. Your script must be transformed so that there is only one psql invocation, looking like this:

psql [connection arguments] << EOF
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
\copy table1 TO file1.csv
\copy table2 TO file2.csv
\copy table3 TO file3.csv
COMMIT;

EOF

Note \\copy instead of COPY , as a consequence of having all grouped in the same psql invocation. psql itself routes each \\copy 's data to each client-side output file.

Also it's a two-step workflow: first generate the above script (such as by looping in bash over the result of psql -c 'select tablename....' or any other method), then execute the script.

Why can't it be simplified to one step?

The loop cannot be implemented in the psql script because psql doesn't have loops, except somewhat with \\gexec , but it's not applicable here because \\copy is a meta-command, and \\gexec handles only SQL commands.

The loop also cannot be implemented in plpgsql, unless changing the context of the question, because each output of COPY TO STDOUT would not be routed to a corresponding per-table client-side file. It would come back to the client as everything concatenated into a single stream. If using the SQL command COPY TO file it would work but you need to be superuser and the files end up on the server, not on the client.

I finally ended up with this solution :

PGENGINE=$PGHOME/bin
    CHEMIN_SCRIPT_TRANSACTION=/$ROOT_PATH/plc/proc/tmp/dump_transaction.sql
    DOSSIER_DUMP_FICHIERS=/$ROOT_PATH/dump/dump_$6/dump_fichiers

    echo "BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;" > $CHEMIN_SCRIPT_TRANSACTION

    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
    while read TBL; do
        echo "\copy $TBL TO $DOSSIER_DUMP_FICHIERS/$TBL.csv WITH CSV HEADER DELIMITER ';';" >> $CHEMIN_SCRIPT_TRANSACTION
        echo "\echo " >> $CHEMIN_SCRIPT_TRANSACTION
    done
    echo "COMMIT;" >> $CHEMIN_SCRIPT_TRANSACTION

    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -d $5 -f $CHEMIN_SCRIPT_TRANSACTION

I'm creating a script in a different file and then I use psql -f to play this script.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM