简体   繁体   中英

PostgreSQL COPY of a CSV file breaking before completion. Works with psql \copy but not COPY

I'm currently working with what I think to be a pretty large file that I need to ingest into a postgreSQL database. This is a few example rows of the .csv reads that the COPY command uses to insert into the database. The structure of the rows is that sample_id and otu_id are two foreign keys which refer to primary keys in a sample table and an otu table.

sample_id,otu_id,count
163,2901,0.0
164,2901,0.0
165,2901,0.0

Which is ingested into the table with the following code using SQLAlchemy:

self._engine.execute(
                    text('''COPY otu.sample_otu from :csv CSV header''').execution_options(autocommit=True),
                    csv=fname)

After copying to the table from the .csv file, I query the database for the samples shown and it gives me a result for the sample_id=163, otu_id=2901 but it doesn't give me the rows after that. If I'm correct, the COPY command stops copying after the first error it encounters so my guess is that there's a problem with the sample of id 164 and otu id of 2901.

I've tried the following:

  1. There is a valid entry in the otu table for 2901, likewise for the sample table id 164 so I don't think it's a missing key error.

  2. I have also searched the file for duplicate foreign key combinations and I can't seem to find any.

  3. I've tried to only write every second entry into the .csv file that is copied from incase it was something to do with how large the .csv file was but it ended up giving me the same issue but cutting off at the different point. When only copying entries with even otu_ids, the subsequent table query results for otu_ids > 2890 breaks at sample id 152, otu id 2900.

I tried using psql's \\copy command to manually copy from the .csv file:

\copy sample_otu FROM 'bpaotu-ijpgihw6' WITH DELIMITER  ',' CSV HEADER;

This seems to work perfectly fine. The query shows otu_ids past the otu id 2901.

I'm just very confused as to why it breaks there as the .csv rows before and afterward look identical and there are entries as the corresponding primary key values in the foreign tables which it uses.

For anyone that comes across this:

The problem was a simple scope error. I missed an indent on the file using block in the python database importing script so it was attempting to COPY from the .csv file before it was finished writing the file. So the copy statement wouldn't throw errors as it assumed the file was finished when it was actually still being written to.

I moved the indentation so the COPY is run after the file writing block has closed and it now works.

It also explains why the psql \\copy command worked and the scripted COPY didn't work as the manual copy was after the file was already created but failed to fully import.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM