[英]Effective way to write table to SAP HANA from R
I have a table ( df
) of about 50,000 rows and 12 columns to write to SAP HANA.我有一个大约 50,000 行和 12 列的表(
df
)要写入 SAP HANA。 I use the RJDBC
library and write row by row as follows:我使用
RJDBC
库,逐行写如下:
# Returns the sql statement to insert one row
build_insert_string <- function(db_output, row) {
row_string <- paste(row, collapse="','")
statement <- paste('INSERT INTO "', db_output$SCHEMA, '"."',db_output$table_name,'" (',db_output$string_of_columns,') VALUES (\'', row_string, '\');', sep='')
return(statement)
}
# Insert row by row
for(i in 1:nrow(df)){
tryCatch({ dbGetQuery(jdbcConnection, build_insert_string(db_output, df[i,])) }, error = function(e) {handle_db_errors(e)})
}
where db_output
is a list containing the output constants (schema, table and columns).其中
db_output
是一个包含输出常量(架构、表和列)的列表。
Currently, it takes almost half of day to write the table.目前,写表几乎需要半天时间。 It seems that HANA does not support batch inserts such as:
似乎 HANA 不支持批量插入,例如:
INSERT INTO example
(example_id, name, value, other_value)
VALUES
(100, 'Name 1', 'Value 1', 'Other 1'),
(101, 'Name 2', 'Value 2', 'Other 2'),
(102, 'Name 3', 'Value 3', 'Other 3'),
(103, 'Name 4', 'Value 4', 'Other 4');
Did anyone encounter this challenge, and if so, did you find a way to circumvent it and improve the writing efficiency?有没有人遇到过这个挑战,如果有,你有没有找到绕过它并提高写作效率的方法?
I'll leave this here for posterity:我将把它留在这里供后代使用:
While dbGetQuery
is the clean solution for large tables – it executes the query and then clears the resultset after each insertion, it is also slow.虽然
dbGetQuery
是大表的干净解决方案——它执行查询,然后在每次插入后清除结果集,但它也很慢。
Apparently, multiple INSERT
s into SAP HANA are successful when sent from the SQL editor but not when sent from R.显然,从 SQL 编辑器发送到 SAP HANA 的多个
INSERT
是成功的,但从 R 发送时则不成功。
A (really) fast solution would be provided by:将通过以下方式提供(真正)快速的解决方案:
dbWriteTable (
conn = jdbcConnection,
name= paste0(db_output$SCHEMA, ".",db_output$table_name),
value = df,
row.names = FALSE,
field.types = db_output$string_of_columns,
append=TRUE
)
However, dbWriteTable()
is not meant for large tables (it will throw a memory limit error).但是,
dbWriteTable()
不适用于大表(它会引发内存限制错误)。 This limitation can be circumvented by increasing the memory allocation pool by modifying the Xmx
Java option, such as: options(java.parameters="- Xmx5000m")
.可以通过修改
Xmx
Java 选项来增加内存分配池来规避此限制,例如: options(java.parameters="- Xmx5000m")
。 Use it at your own peril, especially if you aim to automate the writing of increasingly big tables.使用它有你自己的危险,特别是如果你的目标是自动编写越来越大的表。
Another potential solution we explored was to export the R
ouput as .csv
(or multiple .csv
s in case of a more than 1 million rows), and then send a query to import the .csv
s to SAP HANA
.我们探讨的另一个潜在的解决方案是出口
R
输出中为.csv
(或多个.csv
■在一个超过100万行的情况下),然后将查询发送到导入.csv
s到SAP HANA
。 Large csv
s get imported very fast to SAP HANA, but this solution entails an extra step (an intermediary .csv
output) and it is more prone to incorrect data importation.大型
csv
导入到 SAP HANA 的速度非常快,但此解决方案需要一个额外的步骤(中间.csv
输出),并且更容易出现不正确的数据导入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.