简体繁体 English

PostgreSQL COPY FROM 命令帮助

[英]PostgreSQL COPY FROM Command Help

原文 2011-04-29 11:46:57 3 1 database/ postgresql/ csv-import

I have CSV File which is quite large (few hundred MBs) which I am trying to import into Postgres Table, problem arise when there, is some primary key violation (duplicate record in CSV File)我有 CSV 文件，它非常大（几百 MB），我试图将其导入 Postgres 表，出现问题时出现一些主键违规（CSV 文件中的重复记录）

If it has been one I could manually filter out those records, but these files are generated by a program which generate such data every hour.如果是这样，我可以手动过滤掉这些记录，但这些文件是由每小时生成此类数据的程序生成的。 My script has to automatically import it to database.我的脚本必须自动将其导入数据库。

My question is: Is there some way out that I can set a flag in COPY command or in Postgres so It can skip the duplicate records and continue importing file to table?我的问题是：有什么办法可以在 COPY 命令或 Postgres 中设置一个标志，以便它可以跳过重复记录并继续将文件导入表？

1 个解决方案

My thought would be to approach this in two ways:我的想法是通过两种方式解决这个问题：

Use a utility that can help create an "exception report" of duplicate rows, such as this one during the COPY process.使用可以帮助创建重复行的“异常报告”的实用程序，例如在 COPY 过程中的这个。
Change your workflow by loading the data into a temp table first, massaging it for duplicates (maybe JOIN with your target table and mark all existing in the temp with a dup flag), and then only import the missing records and send the dups to an exception table.通过首先将数据加载到临时表中来更改您的工作流程，对其重复进行按摩（可能与您的目标表连接并用 dup 标志标记临时表中存在的所有记录），然后仅导入丢失的记录并将 dups 发送到异常表。

I personally prefer the second approach, but that's a matter of specific workflow in your case.我个人更喜欢第二种方法，但这取决于您的具体工作流程。