从 R 写表到 SAP HANA 的有效方法

Question

I have a table ( df ) of about 50,000 rows and 12 columns to write to SAP HANA.我有一个大约 50,000 行和 12 列的表（ df ）要写入 SAP HANA。 I use the RJDBC library and write row by row as follows:我使用RJDBC库，逐行写如下：

# Returns the sql statement to insert one row
 build_insert_string <- function(db_output, row) {
  row_string <- paste(row, collapse="','")
  statement <- paste('INSERT INTO "', db_output$SCHEMA, '"."',db_output$table_name,'" (',db_output$string_of_columns,') VALUES (\'', row_string, '\');', sep='')
  return(statement)
}

# Insert row by row
for(i in 1:nrow(df)){
    tryCatch({ dbGetQuery(jdbcConnection, build_insert_string(db_output, df[i,])) }, error = function(e) {handle_db_errors(e)})
  }

where db_output is a list containing the output constants (schema, table and columns).其中db_output是一个包含输出常量（架构、表和列）的列表。

Currently, it takes almost half of day to write the table.目前，写表几乎需要半天时间。 It seems that HANA does not support batch inserts such as:似乎 HANA 不支持批量插入，例如：

INSERT INTO example
  (example_id, name, value, other_value)
VALUES
  (100, 'Name 1', 'Value 1', 'Other 1'),
  (101, 'Name 2', 'Value 2', 'Other 2'),
  (102, 'Name 3', 'Value 3', 'Other 3'),
  (103, 'Name 4', 'Value 4', 'Other 4');

Did anyone encounter this challenge, and if so, did you find a way to circumvent it and improve the writing efficiency?有没有人遇到过这个挑战，如果有，你有没有找到绕过它并提高写作效率的方法？

Answer 1

I'll leave this here for posterity:我将把它留在这里供后代使用：

While dbGetQuery is the clean solution for large tables – it executes the query and then clears the resultset after each insertion, it is also slow.虽然dbGetQuery是大表的干净解决方案——它执行查询，然后在每次插入后清除结果集，但它也很慢。

Apparently, multiple INSERT s into SAP HANA are successful when sent from the SQL editor but not when sent from R.显然，从 SQL 编辑器发送到 SAP HANA 的多个INSERT是成功的，但从 R 发送时则不成功。

A (really) fast solution would be provided by:将通过以下方式提供（真正）快速的解决方案：

dbWriteTable (
conn = jdbcConnection,
name= paste0(db_output$SCHEMA, ".",db_output$table_name),
value = df,
row.names = FALSE,
field.types = db_output$string_of_columns, 
append=TRUE
)

However, dbWriteTable() is not meant for large tables (it will throw a memory limit error).但是， dbWriteTable()不适用于大表（它会引发内存限制错误）。 This limitation can be circumvented by increasing the memory allocation pool by modifying the Xmx Java option, such as: options(java.parameters="- Xmx5000m") .可以通过修改Xmx Java 选项来增加内存分配池来规避此限制，例如： options(java.parameters="- Xmx5000m") 。 Use it at your own peril, especially if you aim to automate the writing of increasingly big tables.使用它有你自己的危险，特别是如果你的目标是自动编写越来越大的表。

Another potential solution we explored was to export the R ouput as .csv (or multiple .csv s in case of a more than 1 million rows), and then send a query to import the .csv s to SAP HANA .我们探讨的另一个潜在的解决方案是出口R输出中为.csv （或多个.csv ■在一个超过100万行的情况下），然后将查询发送到导入.csv s到SAP HANA 。 Large csv s get imported very fast to SAP HANA, but this solution entails an extra step (an intermediary .csv output) and it is more prone to incorrect data importation.大型csv导入到 SAP HANA 的速度非常快，但此解决方案需要一个额外的步骤（中间.csv输出），并且更容易出现不正确的数据导入。

从 R 写表到 SAP HANA 的有效方法

问题描述

1 个解决方案

解决方案1
0 2018-12-10 15:07:11

从 R 写表到 SAP HANA 的有效方法

问题描述

1 个解决方案

解决方案1 0 2018-12-10 15:07:11

解决方案1
0 2018-12-10 15:07:11