[英]How to ignore errors with psql \copy meta-command
I am using psql
with a PostgreSQL database and the following copy
command:我将
psql
与 PostgreSQL 数据库和以下copy
命令一起使用:
\COPY isa (np1, np2, sentence) FROM 'c:\Downloads\isa.txt' WITH DELIMITER '|'
I get:我得到:
ERROR: extra data after last expected column
How can I skip the lines with errors?如何跳过有错误的行?
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.如果不跳过包括 Postgres 14 在内的整个命令,就无法跳过错误。目前没有更复杂的错误处理。
\\copy
is just a wrapper around SQL COPY
that channels results through psql. \\copy
只是 SQL COPY
的包装器,它通过 psql 传递结果。 The manual for COPY
: COPY
手册:
COPY
stops operation at the first error.COPY
在出现第一个错误时停止操作。 This should not lead to problems in the event of aCOPY TO
, but the target table will already have received earlier rows in aCOPY FROM
.这应该不会在
COPY TO
的情况下导致问题,但是目标表已经在COPY FROM
收到了较早的行。 These rows will not be visible or accessible, but they still occupy disk space.这些行将不可见或不可访问,但它们仍会占用磁盘空间。 This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation.
如果故障发生在大型复制操作中,这可能会浪费大量磁盘空间。 You might wish to invoke
VACUUM
to recover the wasted space.您可能希望调用
VACUUM
来恢复浪费的空间。
Bold emphasis mine.大胆强调我的。 And:
并且:
COPY FROM
will raise an error if any line of the input file contains more or fewer columns than are expected.如果输入文件的任何行包含的列比预期的多
COPY FROM
将引发错误。
COPY
is an extremely fast way to import / export data. COPY
是一种极其快速的数据导入/导出方式。 Sophisticated checks and error handling would slow it down.复杂的检查和错误处理会减慢它的速度。
There was an attempt to add error logging to COPY
in Postgres 9.0 but it was never committed.曾尝试在 Postgres 9.0 中向
COPY
添加错误日志记录,但从未提交。
Fix your input file instead.改为修复您的输入文件。
If you have one or more additional column in your input file and the file is otherwise consistent , you might add dummy columns to your table isa
and drop those afterwards.如果您的输入文件中有一个或多个附加列,并且该文件在其他方面是一致的,您可以向表
isa
添加虚拟列,然后再删除这些列。 Or (cleaner with production tables) import to a temporary staging table and INSERT
selected columns (or expressions) to your target table isa
from there.或者(使用生产表进行清理)导入到临时登台表并将选定的列(或表达式)从那里
INSERT
到目标表isa
。
Related answers with detailed instructions:带有详细说明的相关答案:
It is too bad that in 25 years Postgres doesn't have -ignore-errors
flag or option for COPY
command.太糟糕了,25 年来 Postgres 没有
-ignore-errors
标志或COPY
命令选项。 In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.在这个大数据时代,你会得到很多脏记录,项目修复每个异常值的成本可能非常高。
I had to make a work-around this way:我不得不以这种方式解决问题:
dummy_original_table
dummy_original_table
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
psql dbname -c \\copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\\t'
Here's one solution -- import the batch file one line at a time.这是一种解决方案——一次一行导入批处理文件。 The performance can be much slower, but it may be sufficient for your scenario:
性能可能会慢得多,但对于您的场景可能已经足够了:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database \
-c "\
COPY my_table \
FROM `$tmp_file` \
DELIMITER '|'\
CSV;\
"
done
Additionally, you could modify the script to capture the psql
stdout/stderr and exit status, and if the exit status is non-zero, echo $input_line
and the captured stdout/stderr to stdin and/or append it to a file.此外,您可以修改脚本以捕获
psql
stdout/stderr 和退出状态,如果退出状态非零, $input_line
和捕获的 stdout/stderr 回显到 stdin 和/或将其附加到文件中。
Workaround: remove the reported errant line using sed
and run \\copy
again解决方法:使用
sed
删除报告的错误行并再次运行\\copy
Later versions of Postgres (including Postgres 13), will report the line number of the error.更高版本的 Postgres(包括 Postgres 13)将报告错误的行号。 You can then remove that line with
sed
and run \\copy again, eg,然后,您可以使用
sed
删除该行并再次运行 \\copy,例如,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from @Botond_Balázs ] [根据@Botond_Balázs 的评论]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.