简体   繁体   English

Python psql \\将CSV复制到远程服务器

[英]Python psql \copy CSV to remote server

I am attempting to copy a csv (which has a header and quote character ") with python 3.6 to a table on a remote postgres 10 server. It is a large CSV (2.5M rows, 800MB) and while I previously imported it into a dataframe and then used dataframe.to_sql, this was very memory intensive so I switched to using COPY. 我正在尝试将python 3.6的csv(具有标头和引号“)复制到远程Postgres 10服务器上的表中。它是大型CSV文件(2.5M行,800MB),而我以前将其导入到dataframe,然后使用dataframe.to_sql,这非常占用内存,因此我转而使用COPY。

Using COPY with psycopg2 or sqlalchemy would work fine but the remote server does not have access to the local file system. 将COPY与psycopg2或sqlalchemy一起使用可以正常工作,但是远程服务器无权访问本地文件系统。

Using psql in the terminal I have successfully run the query below to populate the table. 在终端中使用psql,我已成功运行以下查询以填充表。 I don't think using \\copy is possible with psycopg2 or sqlalchemy. 我认为psycopg2或sqlalchemy无法使用\\ copy。

\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''

However when I try to use a one line psql -c command like below, it does not work and I get the error: 但是,当我尝试使用如下所示的单行psql -c命令时,它不起作用并且出现错误:

ERROR: COPY quote must be a single one-byte character. 错误:COPY引号必须是一个单字节字符。

psql -U user -h ip -d db -w pw -c "\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''"

Could you tell me why this is the case? 你能告诉我为什么会这样吗?

This one-line -c psql statement would be easier to implement with the subprocess module in python than having to open a terminal and execute a command which I'm not sure how to do. 与只需要打开终端并执行我不确定如何执行的命令相比,使用python中的subprocess模块​​执行此单行-c psql语句会更容易。 If you could suggest a workaround or different methodology that would be great. 如果您可以提出解决方法或其他方法,那就更好了。

====== Per Andrew's suggestion to escape the quote character this worked on the command line. ====== Per Andrew的建议是在命令行中转义引号字符。 However when implementing it in python like below, a new error comes up: 但是,当像下面这样在python中实现它时,会出现一个新错误:

/bin/sh: -c: line 0: unexpected EOF while looking for matching `'' / bin / sh:-c:第0行:寻找匹配的''''时出现意外的EOF

/bin/sh: -c: line 1: syntax error: unexpected end of file / bin / sh:-c:第1行:语法错误:文件意外结束

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\""
cmd = f'psql -U {user} -h {ip} -d {db} -w {pw} -c {copy_statement}'
subprocess.call(cmd, shell=True)

Try not to use shell=True if you can avoid it. 如果可以避免,请不要使用shell=True better to tokenize the command yourself to help sh. 最好自己标记命令以帮助sh。

subprocess.call(["psql", "-U", "{user}", "-h", "{ip}", "-d", "{db}", "-w", "{pw}", "-c", "{copy statement}"])

In this case then your copy statement could be as it is passed to psql verbatim, because there are no shell quoting issues to take into account. 在这种情况下,您的copy语句可能是原样传递给psql的,因为没有shell引用问题需要考虑。 (NB still have to quote this for python, so the string would remain as is). (NB仍必须为python引用此字符串,因此字符串将保持原样)。


If you still want to use shell=True then you have to escape the string literal for both python and shell 如果仍然要使用shell=True则必须对python shell都使用字符串常量

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""

will create a string in python which will be 将在python中创建一个字符串

"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\"

Which is what we found out we needed on our shell in the first place! 这是我们首先发现需要在外壳上使用的内容!


Edit (clarifying something from the comments) : 编辑 (从注释中澄清)

subprocess.call , when not using shell=True , takes an iterable of arguments. subprocess.call ,当不使用shell=True ,采用可迭代的参数。

So you could have 所以你可以

psql_command = "\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""
# user, hostname, password, dbname all defined elsewhere above.
command = ["psql",
    "-U", user,
    "-h", hostname,
    "-d", dbname,
    "-w", password,
    "-c", psql_command,
]

subprocess.call(command)

See https://docs.python.org/2/library/subprocess.html#subprocess.call or https://docs.python.org/3/library/subprocess.html#subprocess.call 请参阅https://docs.python.org/2/library/subprocess.html#subprocess.callhttps://docs.python.org/3/library/subprocess.html#subprocess.call

extra edit :- Please note that to avoid shell injection, you should be using the method described here. 额外编辑:-请注意,为避免注入外壳,您应使用此处描述的方法。 See the warning section of https://docs.python.org/2/library/subprocess.html#frequently-used-arguments 请参阅https://docs.python.org/2/library/subprocess.html#frequently-used-arguments的警告部分

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM