简体   繁体   English

如何快速将数据从R导出到SQL Server

[英]How to quickly export data from R to SQL Server

The standard RODBC package's sqlSave function even as a single INSERT statement (parameter fast = TRUE ) is terribly slow for large amounts of data due to non-minimal loading. 标准的RODBC软件包的sqlSave函数即使作为单个INSERT语句(参数fast = TRUE ),由于非最小的加载,对于大量数据来说非常慢。 How would I write data to my SQL server with minimal logging so it writes much more quickly? 如何以最少的日志记录将数据写入我的SQL服务器,以便更快地写入?

Currently trying: 目前尝试:

toSQL = data.frame(...);
sqlSave(channel,toSQL,tablename="Table1",rownames=FALSE,colnames=FALSE,safer=FALSE,fast=TRUE);

By writing the data to a CSV locally and then using a BULK INSERT (not readily available as a prebuilt function akin to sqlSave ), the data can be written to the MS SQL Server very quickly. 通过在本地将数据写入CSV然后使用BULK INSERT (不像sqlSave那样的预建函数),可以非常快速地将数据写入MS SQL Server。

toSQL = data.frame(...);
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
    sqlQuery(channel,"BULK
                INSERT Yada.dbo.yada
                FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
                WITH
                (
                FIELDTERMINATOR = ',',
                ROWTERMINATOR = '\\n'
                )");

SQL Server must have permission to access the network folder holding the CSV file, or else this process will not work. SQL Server必须具有访问包含CSV文件的网络文件夹的权限,否则此过程将不起作用。 While it takes some setup with various permissions (the network folder and BULK ADMIN privileges, the reward in speed is infinitely more valuable). 虽然需要一些具有各种权限的设置(网络文件夹和BULK ADMIN权限,但速度的奖励无限更有价值)。

I completely agree that BULK INSERT is the right option for any data which are non-tiny . 我完全同意BULK INSERT是任何非微小数据的正确选择。 However in case you need to add 2-3 lines of eg debug message it BULK INSERT seems to be an overkill. 但是,如果您需要添加2-3行例如调试消息,那么BULK INSERT似乎是一种矫枉过正。

The answer to your question would be a DBI::dbWriteTable() function. 您的问题的答案是DBI::dbWriteTable()函数。 Example below (I am connecting my R code to AWS RDS instance of MS SQL Express ): 下面的示例(我将我的R代码连接到MS SQL Express AWS RDS实例):

library(DBI)
library(RJDBC)
library(tidyverse)

# Specify where you driver lives
drv <- JDBC(
  "com.microsoft.sqlserver.jdbc.SQLServerDriver",
  "c:/R/SQL/sqljdbc42.jar") 

# Connect to AWS RDS instance
conn <- drv %>%
  dbConnect(
    host = "jdbc:sqlserver://xxx.ccgqenhjdi18.ap-southeast-2.rds.amazonaws.com",
    user = "xxx",
    password = "********",
    port = 1433,
    dbname= "qlik")

if(0) { # check what the conn object has access to
  queryResults <- conn %>%
    dbGetQuery("select * from information_schema.tables")
}

# Create test data
example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"),
                           feel=c("furry", "furry", "squishy", "spiny"),
                           weight=c(45, 8, 1.1, 0.8))
# Works in 20ms in my case
system.time(
  conn %>% dbWriteTable(
    "qlik.export.test",
    example_data
  )
)

# Let us see if we see the exported results
conn %>% dbGetQuery("select * FROM qlik.export.test")

# Let's clean the mess and force-close connection at the end of the process
conn %>% dbDisconnect()

It works pretty fast for small amount of data transferred and seems rather elegant if you want data.frame -> SQL table solution. 对于传输的少量数据,它的工作速度非常快,如果你想要data.frame - > SQL table解决方案,它看起来相当优雅。

Enjoy! 请享用!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM