简体   繁体   English

CSV文件的PostgreSQL COPY在完成前中断。 与psql \\ copy一起使用,但不支持COPY

[英]PostgreSQL COPY of a CSV file breaking before completion. Works with psql \copy but not COPY

I'm currently working with what I think to be a pretty large file that I need to ingest into a postgreSQL database. 我目前正在处理一个我认为是非常大的文件,需要将其提取到postgreSQL数据库中。 This is a few example rows of the .csv reads that the COPY command uses to insert into the database. 这是.csv读取的一些示例行,COPY命令用于将这些行插入数据库。 The structure of the rows is that sample_id and otu_id are two foreign keys which refer to primary keys in a sample table and an otu table. 行的结构是sample_id和otu_id是两个外键,它们引用样本表和otu表中的主键。

sample_id,otu_id,count
163,2901,0.0
164,2901,0.0
165,2901,0.0

Which is ingested into the table with the following code using SQLAlchemy: 使用以下代码使用SQLAlchemy将其提取到表中:

self._engine.execute(
                    text('''COPY otu.sample_otu from :csv CSV header''').execution_options(autocommit=True),
                    csv=fname)

After copying to the table from the .csv file, I query the database for the samples shown and it gives me a result for the sample_id=163, otu_id=2901 but it doesn't give me the rows after that. 从.csv文件复制到表后,我在数据库中查询显示的样本,它为我提供了sample_id = 163,otu_id = 2901的结果,但此后没有给出行。 If I'm correct, the COPY command stops copying after the first error it encounters so my guess is that there's a problem with the sample of id 164 and otu id of 2901. 如果我是正确的,那么COPY命令在遇到第一个错误后便停止复制,因此我的猜测是id 164和otu id 2901的样本存在问题。

I've tried the following: 我尝试了以下方法:

  1. There is a valid entry in the otu table for 2901, likewise for the sample table id 164 so I don't think it's a missing key error. otu表中存在2901的有效条目,示例表ID 164同样如此,因此我认为这不是丢失的键错误。

  2. I have also searched the file for duplicate foreign key combinations and I can't seem to find any. 我还在文件中搜索了重复的外键组合,但似乎找不到任何组合。

  3. I've tried to only write every second entry into the .csv file that is copied from incase it was something to do with how large the .csv file was but it ended up giving me the same issue but cutting off at the different point. 我试图只将第二个条目写入从复制的.csv文件中,以防它与.csv文件的大小有关,但最终却给了我同样的问题,但截断了不同的地方。 When only copying entries with even otu_ids, the subsequent table query results for otu_ids > 2890 breaks at sample id 152, otu id 2900. 当仅复制具有偶数otu_id的条目时,otu_ids> 2890的后续表查询结果在样本id 152(otu id 2900)处中断。

I tried using psql's \\copy command to manually copy from the .csv file: 我尝试使用psql的\\ copy命令从.csv文件手动复制:

\copy sample_otu FROM 'bpaotu-ijpgihw6' WITH DELIMITER  ',' CSV HEADER;

This seems to work perfectly fine. 这似乎工作得很好。 The query shows otu_ids past the otu id 2901. 该查询显示了otu id 2901之后的otu_ids。

I'm just very confused as to why it breaks there as the .csv rows before and afterward look identical and there are entries as the corresponding primary key values in the foreign tables which it uses. 我对它为什么会中断感到非常困惑,因为前后前后的.csv行看起来相同,并且在它使用的外表中有对应的主键值条目。

For anyone that comes across this: 对于遇到此问题的任何人:

The problem was a simple scope error. 问题是一个简单的范围错误。 I missed an indent on the file using block in the python database importing script so it was attempting to COPY from the .csv file before it was finished writing the file. 我错过了python数据库导入脚本中使用block的文件上的缩进,因此它试图在完成写入文件之前从.csv文件进行COPY。 So the copy statement wouldn't throw errors as it assumed the file was finished when it was actually still being written to. 因此,copy语句不会抛出错误,因为它假定文件在实际上仍在写入时已完成。

I moved the indentation so the COPY is run after the file writing block has closed and it now works. 我移动了缩进,以使COPY在文件写入块关闭后运行,并且现在可以运行了。

It also explains why the psql \\copy command worked and the scripted COPY didn't work as the manual copy was after the file was already created but failed to fully import. 它还说明了为什么psql \\ copy命令有效,而脚本COPY却不起作用,因为手动复制是在文件已创建但无法完全导入之后进行的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM