简体   繁体   English

将PSQL COPY表转换为CSV-生产数据库上的数据一致性

[英]PSQL COPY tables to CSV - Data consistency on production database

I'm doing a dump of a database to separate CSV files, using the shell script below : 我正在使用以下shell脚本对数据库进行转储以分离CSV文件:

PGENGINE=$PGHOME/bin
PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
while read TBL; do
    echo "Exporting table "$TBL
    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -c "COPY public.$TBL TO STDOUT WITH CSV HEADER DELIMITER '"$SEPARATEUR_CSV"'" $5 > /$ROOT_PATH/$6/$TBL.csv
    echo -e $TBL ": Export done\n"
done

This works fine on my test database, but I am concerned about what will happen running it on a production database. 这在我的测试数据库上可以正常工作,但是我担心在生产数据库上运行它会发生什么。

I saw many topics saying that pg_dump acquires a lock on data but I don't know about psql COPY, including the fact that I'm making a loop over all tables. 我看到许多主题都在说pg_dump获得了数据锁定,但是我不了解psql COPY,包括我在所有表上循环的事实。 I need to be sure that if a user updates one of my tables, the COPY command will still get the right data and the right FKs. 我需要确保,如果用户更新了我的一张表,则COPY命令仍将获取正确的数据和正确的FK。

My questions: 我的问题:

  1. Do you think it is a proper way to do it ? 您认为这是正确的方法吗? Is a stored procedure safer for data consistency? 存储过程对数据一致性是否更安全?

  2. What would be the most efficient way to achieve this? 实现这一目标的最有效方法是什么? (since this production database is quite large - some tables are over 30 million rows). (由于该生产数据库非常大-有些表超过3000万行)。

A consistent read across tables in a live database is achieved by starting a transaction in the REPEATABLE READ isolation mode and ending it when all is read. 通过在REPEATABLE READ隔离模式下启动事务并在读取所有内容时结束该事务,可以实现实时数据库中各表之间的一致读取。 Your script must be transformed so that there is only one psql invocation, looking like this: 必须对脚本进行转换,以便只有一个psql调用,如下所示:

psql [connection arguments] << EOF
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
\copy table1 TO file1.csv
\copy table2 TO file2.csv
\copy table3 TO file3.csv
COMMIT;

EOF

Note \\copy instead of COPY , as a consequence of having all grouped in the same psql invocation. 注意\\copy而不是COPY ,这是因为将所有分组归入同一psql调用。 psql itself routes each \\copy 's data to each client-side output file. psql本身会将每个\\copy的数据路由到每个客户端输出文件。

Also it's a two-step workflow: first generate the above script (such as by looping in bash over the result of psql -c 'select tablename....' or any other method), then execute the script. 这也是一个两步工作流:首先生成上面的脚本(例如通过在psql -c 'select tablename....'的结果上循环bash或任何其他方法),然后执行该脚本。

Why can't it be simplified to one step? 为什么不能简化为一步?

The loop cannot be implemented in the psql script because psql doesn't have loops, except somewhat with \\gexec , but it's not applicable here because \\copy is a meta-command, and \\gexec handles only SQL commands. 循环无法在psql脚本中实现,因为psql没有循环,除非使用\\gexec ,但在这里不适用,因为\\copy是元命令,并且\\gexec仅处理SQL命令。

The loop also cannot be implemented in plpgsql, unless changing the context of the question, because each output of COPY TO STDOUT would not be routed to a corresponding per-table client-side file. 除非更改问题的上下文,否则也无法在plpgsql中实现该循环,因为COPY TO STDOUT每个输出都不会路由到相应的每表客户端文件。 It would come back to the client as everything concatenated into a single stream. 当所有内容合并为一个流时,它将返回给客户端。 If using the SQL command COPY TO file it would work but you need to be superuser and the files end up on the server, not on the client. 如果使用SQL命令COPY TO file它可以工作,但是您需要成为超级用户,并且文件最终存储在服务器上,而不是在客户端上。

I finally ended up with this solution : 我最终得到了这个解决方案:

PGENGINE=$PGHOME/bin
    CHEMIN_SCRIPT_TRANSACTION=/$ROOT_PATH/plc/proc/tmp/dump_transaction.sql
    DOSSIER_DUMP_FICHIERS=/$ROOT_PATH/dump/dump_$6/dump_fichiers

    echo "BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;" > $CHEMIN_SCRIPT_TRANSACTION

    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -Atc "select tablename from pg_tables where schemaname='public'" $5 |\
    while read TBL; do
        echo "\copy $TBL TO $DOSSIER_DUMP_FICHIERS/$TBL.csv WITH CSV HEADER DELIMITER ';';" >> $CHEMIN_SCRIPT_TRANSACTION
        echo "\echo " >> $CHEMIN_SCRIPT_TRANSACTION
    done
    echo "COMMIT;" >> $CHEMIN_SCRIPT_TRANSACTION

    PGPASSWORD=$1 $PGENGINE/psql -p $2 -h $3 -U $4 -d $5 -f $CHEMIN_SCRIPT_TRANSACTION

I'm creating a script in a different file and then I use psql -f to play this script. 我正在其他文件中创建脚本,然后使用psql -f播放此脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM