简体   繁体   English

从 R 写表到 SAP HANA 的有效方法

[英]Effective way to write table to SAP HANA from R

I have a table ( df ) of about 50,000 rows and 12 columns to write to SAP HANA.我有一个大约 50,000 行和 12 列的表( df )要写入 SAP HANA。 I use the RJDBC library and write row by row as follows:我使用RJDBC库,逐行写如下:

# Returns the sql statement to insert one row
 build_insert_string <- function(db_output, row) {
  row_string <- paste(row, collapse="','")
  statement <- paste('INSERT INTO "', db_output$SCHEMA, '"."',db_output$table_name,'" (',db_output$string_of_columns,') VALUES (\'', row_string, '\');', sep='')
  return(statement)
}

# Insert row by row
for(i in 1:nrow(df)){
    tryCatch({ dbGetQuery(jdbcConnection, build_insert_string(db_output, df[i,])) }, error = function(e) {handle_db_errors(e)})
  }

where db_output is a list containing the output constants (schema, table and columns).其中db_output是一个包含输出常量(架构、表和列)的列表。

Currently, it takes almost half of day to write the table.目前,写表几乎需要半天时间。 It seems that HANA does not support batch inserts such as:似乎 HANA 不支持批量插入,例如:

INSERT INTO example
  (example_id, name, value, other_value)
VALUES
  (100, 'Name 1', 'Value 1', 'Other 1'),
  (101, 'Name 2', 'Value 2', 'Other 2'),
  (102, 'Name 3', 'Value 3', 'Other 3'),
  (103, 'Name 4', 'Value 4', 'Other 4');

Did anyone encounter this challenge, and if so, did you find a way to circumvent it and improve the writing efficiency?有没有人遇到过这个挑战,如果有,你有没有找到绕过它并提高写作效率的方法?

I'll leave this here for posterity:我将把它留在这里供后代使用:

While dbGetQuery is the clean solution for large tables – it executes the query and then clears the resultset after each insertion, it is also slow.虽然dbGetQuery是大表的干净解决方案——它执行查询,然后在每次插入后清除结果集,但它也很慢。

Apparently, multiple INSERT s into SAP HANA are successful when sent from the SQL editor but not when sent from R.显然,从 SQL 编辑器发送到 SAP HANA 的多个INSERT是成功的,但从 R 发送时则不成功。

A (really) fast solution would be provided by:将通过以下方式提供(真正)快速的解决方案:

dbWriteTable (
conn = jdbcConnection,
name= paste0(db_output$SCHEMA, ".",db_output$table_name),
value = df,
row.names = FALSE,
field.types = db_output$string_of_columns, 
append=TRUE
)

However, dbWriteTable() is not meant for large tables (it will throw a memory limit error).但是, dbWriteTable()不适用于大表(它会引发内存限制错误)。 This limitation can be circumvented by increasing the memory allocation pool by modifying the Xmx Java option, such as: options(java.parameters="- Xmx5000m") .可以通过修改Xmx Java 选项来增加内存分配池来规避此限制,例如: options(java.parameters="- Xmx5000m") Use it at your own peril, especially if you aim to automate the writing of increasingly big tables.使用它有你自己的危险,特别是如果你的目标是自动编写越来越大的表。

Another potential solution we explored was to export the R ouput as .csv (or multiple .csv s in case of a more than 1 million rows), and then send a query to import the .csv s to SAP HANA .我们探讨的另一个潜在的解决方案是出口R输出中为.csv (或多个.csv ■在一个超过100万行的情况下),然后将查询发送到导入.csv s到SAP HANA Large csv s get imported very fast to SAP HANA, but this solution entails an extra step (an intermediary .csv output) and it is more prone to incorrect data importation.大型csv导入到 SAP HANA 的速度非常快,但此解决方案需要一个额外的步骤(中间.csv输出),并且更容易出现不正确的数据导入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM