简体   繁体   English

如何从 R 在 PostgreSQL 中写表?

[英]How to write a table in PostgreSQL from R?

At present to insert data in a PostgreSQL table I have to create an empty table and then do an insert into table values ... along with a dataframe collapsed insto a single string with all the values.目前要在 PostgreSQL 表中插入数据,我必须创建一个空表,然后insert into table values ...以及一个折叠成包含所有值的单个字符串的数据框。 It doesn't work for large sized dataframes.它不适用于大型数据帧。

The dbWtriteTable() doesn't work for PostgreSQL and gives the following error... dbWtriteTable()不适用于 PostgreSQL 并给出以下错误...

Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "table_1" FROM STDIN

I have tried the following hack as suggested in answer to a similar question asked before.我已经尝试了以下 hack 作为回答之前提出的类似问题的建议。 Here's the link... How do I write data from R to PostgreSQL tables with an autoincrementing primary key?这是链接... 如何使用自动递增的主键将数据从 R 写入 PostgreSQL 表?

body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
  'postgresqlTableRef(name), "FROM STDIN")', 
  'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")', 
  body_lines,
  fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")

This hack still doesn't work for me.这个 hack 对我仍然不起作用。 The postgresqlWriteTable() throws exactly the same error... What exactly is the problem here? postgresqlWriteTable()抛出完全相同的错误......这里究竟是什么问题?

As an alternative I have tried using dbWriteTable2() from caroline package.作为替代方案,我尝试使用caroline包中的dbWriteTable2() And it throws a different error...它会抛出一个不同的错误......

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)

Is there any other method to write a large dataframe into a table in PostgreSQL directly?有没有其他方法可以直接将大型数据帧写入 PostgreSQL 中的表中?

Ok, I'm not sure why dbWriteTable() would be failing;好的,我不知道为什么dbWriteTable()会失败; there may be some kind of version/protocol mismatch.可能存在某种版本/协议不匹配。 Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.如果可能的话,也许您可​​以尝试安装最新版本的 R、RPostgreSQL 包,并升级系统上的 PostgreSQL 服务器。

Regarding the insert into workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing .关于insert into大数据的变通方法失败,当必须移动大量数据并且一次性传输不可行/不切实际/不稳定时,IT 世界中经常执行的操作有时被称为批处理批处理 Basically, you divide the data into smaller chunks and send each chunk one at a time.基本上,您将数据分成较小的块并一次发送一个块。

As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time.作为一个随机示例,几年前我编写了一些 Java 代码来从 HR LDAP 服务器查询员工信息,该服务器被限制为一次只能提供 1000 条记录。 So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism ) and accumulating the records into a local database until the server reported the query complete.所以基本上我必须编写一个循环来继续发送相同的请求(使用某种奇怪的基于 cookie 的机制跟踪查询状态)并将记录累积到本地数据库中,直到服务器报告查询完成。

Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size.下面是一些代码,它手动构造 SQL 以根据给定的 data.frame 创建一个空表,然后使用参数化的批处理大小将 data.frame 的内容插入到表中。 It's mostly built around calls to paste() to build the SQL strings, and dbSendQuery() to send the actual queries.它主要围绕调用paste()构建 SQL 字符串和dbSendQuery()来发送实际查询而构建。 I also use postgresqlDataType() for the table creation.我还使用postgresqlDataType()来创建表。

## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);

## define helper functions
createEmptyTable <- function(con,tn,df) {
    sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
    dbSendQuery(con,sql);
    invisible();
};

insertBatch <- function(con,tn,df,size=100L) {
    if (nrow(df)==0L) return(invisible());
    cnt <- (nrow(df)-1L)%/%size+1L;
    for (i in seq(0L,len=cnt)) {
        sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
        dbSendQuery(con,sql);
    };
    invisible();
};

## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));

## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE

Note that I didn't bother prepending a row.names column to the database table, unlike dbWriteTable() , which always seems to include such a column (and doesn't seem to provide any means of preventing it).请注意,与dbWriteTable()不同,我没有费心在数据库表中添加row.names列,后者似乎总是包含这样的列(并且似乎没有提供任何阻止它的方法)。

I had the same error while working through this example .我在处理这个例子时遇到了同样的错误。

For me worked:对我来说工作:

dbWriteTable(con, "cartable", value = df, overwrite = T, append = F, row.names = FALSE)

While I have configured a table "cartable" in pgAdmin.虽然我在 pgAdmin 中配置了一个表“cartable”。 So an empty table existed and I had to overwrite that table with values.所以存在一个空表,我不得不用值覆盖该表。

So the answer showing batch processing given earlier is 99.99% correct.因此,前面给出的显示批处理的答案是 99.99% 正确的。 However, it doesn't work on windows because of a tiny argument required at the 'insertBatch' function.但是,由于 'insertBatch' 函数需要一个小参数,它在 Windows 上不起作用。 (was not able to add a comment for the same answer) (无法为相同的答案添加评论)

The 'shQuote' function requires an argument type = 'cmd2' for it to work. 'shQuote' 函数需要一个参数 type = 'cmd2' 才能工作。

However, to add an argument there, you need this answer:但是,要在那里添加参数,您需要以下答案:

[https://stackoverflow.com/questions/6827299/r-apply-function-with-multiple-parameters][1] [https://stackoverflow.com/questions/6827299/r-apply-function-with-multiple-parameters][1]

So, the new 'insertBatch' function becomes:因此,新的“insertBatch”函数变为:

 insertBatch <- function(con,tn,df,size=100L) { if (nrow(df)==0L) return(invisible()); cnt <- (nrow(df)-1L)%/%size+1L; for (i in seq(0L,len=cnt)) { sql <- paste0("insert into \\"",tn,"\\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote,type = 'cmd2'))),");"); dbSendQuery(con,sql); }; invisible(); };

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM