简体   繁体   中英

Python psql \copy CSV to remote server

I am attempting to copy a csv (which has a header and quote character ") with python 3.6 to a table on a remote postgres 10 server. It is a large CSV (2.5M rows, 800MB) and while I previously imported it into a dataframe and then used dataframe.to_sql, this was very memory intensive so I switched to using COPY.

Using COPY with psycopg2 or sqlalchemy would work fine but the remote server does not have access to the local file system.

Using psql in the terminal I have successfully run the query below to populate the table. I don't think using \\copy is possible with psycopg2 or sqlalchemy.

\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''

However when I try to use a one line psql -c command like below, it does not work and I get the error:

ERROR: COPY quote must be a single one-byte character.

psql -U user -h ip -d db -w pw -c "\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''"

Could you tell me why this is the case?

This one-line -c psql statement would be easier to implement with the subprocess module in python than having to open a terminal and execute a command which I'm not sure how to do. If you could suggest a workaround or different methodology that would be great.

====== Per Andrew's suggestion to escape the quote character this worked on the command line. However when implementing it in python like below, a new error comes up:

/bin/sh: -c: line 0: unexpected EOF while looking for matching `''

/bin/sh: -c: line 1: syntax error: unexpected end of file

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\""
cmd = f'psql -U {user} -h {ip} -d {db} -w {pw} -c {copy_statement}'
subprocess.call(cmd, shell=True)

Try not to use shell=True if you can avoid it. better to tokenize the command yourself to help sh.

subprocess.call(["psql", "-U", "{user}", "-h", "{ip}", "-d", "{db}", "-w", "{pw}", "-c", "{copy statement}"])

In this case then your copy statement could be as it is passed to psql verbatim, because there are no shell quoting issues to take into account. (NB still have to quote this for python, so the string would remain as is).


If you still want to use shell=True then you have to escape the string literal for both python and shell

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""

will create a string in python which will be

"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\"

Which is what we found out we needed on our shell in the first place!


Edit (clarifying something from the comments) :

subprocess.call , when not using shell=True , takes an iterable of arguments.

So you could have

psql_command = "\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""
# user, hostname, password, dbname all defined elsewhere above.
command = ["psql",
    "-U", user,
    "-h", hostname,
    "-d", dbname,
    "-w", password,
    "-c", psql_command,
]

subprocess.call(command)

See https://docs.python.org/2/library/subprocess.html#subprocess.call or https://docs.python.org/3/library/subprocess.html#subprocess.call

extra edit :- Please note that to avoid shell injection, you should be using the method described here. See the warning section of https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM