简体   繁体   中英

Upsert from R to MS SQL Server Syntax

I want to be able to use an Upsert, where I update existing rows of conditions change or append new rows as they become available. I want to code this from within R studio and pair with a MS SQL Server. I have replicated the workflow below using the iris data set. I think I am almost there, but can not quite finish off the SQL query. Also, open to suggestions on workflow.

> pacman::p_load(DBI, dbplyr, dplyr, odbc)
> 
> # Connection
> con <- DBI::dbConnect(odbc::odbc(), 
+                       Driver = "SQL Server", 
+                       Server = "localhost\\SQLEXPRESS", 
+                       Database = "master", 
+                       Trusted_Connection = "True")
> 
> Data.DWH <- dbReadTable(con, "iris") %>% 
+     unite("Lookup_Key", colnames(select(., - ID)), remove = FALSE)
> 
> # Data in data warehouse
> head(Data.DWH)
              Lookup_Key Sepal.Length Sepal.Width Petal.Length Petal.Width Species ID
1 5.1_3.5_1.4_0.2_setosa          5.1         3.5          1.4         0.2  setosa  1
2   4.9_3_1.4_0.2_setosa          4.9         3.0          1.4         0.2  setosa  2
3 4.7_3.2_1.3_0.2_setosa          4.7         3.2          1.3         0.2  setosa  3
4 4.6_3.1_1.5_0.2_setosa          4.6         3.1          1.5         0.2  setosa  4
5   5_3.6_1.4_0.2_setosa          5.0         3.6          1.4         0.2  setosa  5
6 5.4_3.9_1.7_0.4_setosa          5.4         3.9          1.7         0.4  setosa  6
> 
> # New data example (created), 1 entry to append, 1 entry to ignore, 1 entry to update
> 
> New.Data.Raw <- data.frame(stringsAsFactors=FALSE,
+    Sepal.Length = c(5.1, 1, 4.9),
+     Sepal.Width = c(3.5, 2, 3),
+    Petal.Length = c(1.4, 3, 1.4),
+     Petal.Width = c(2, 4, 0.2),
+         Species = c("setosa", "setosa", "setosa"),
+              ID = c(1, 151, 2)
+ ) %>% unite("Lookup_Key", colnames(select(., - ID)), remove = FALSE)
> 
> head(New.Data.Raw)
            Lookup_Key Sepal.Length Sepal.Width Petal.Length Petal.Width Species  ID
1 5.1_3.5_1.4_2_setosa          5.1         3.5          1.4         2.0  setosa   1
2       1_2_3_4_setosa          1.0         2.0          3.0         4.0  setosa 151
3 4.9_3_1.4_0.2_setosa          4.9         3.0          1.4         0.2  setosa   2
> 
> # Ready for insert/update
> # check for changes in the look up key (mash up of row values) or new entries according to ID
> New.Data <- New.Data.Raw %>% 
+     filter(!ID %in% Data.DWH$ID |
+                (ID %in% Data.DWH$ID & !Lookup_Key %in% Data.DWH$Lookup_Key))
> 
> head(New.Data)
            Lookup_Key Sepal.Length Sepal.Width Petal.Length Petal.Width Species  ID
1 5.1_3.5_1.4_2_setosa          5.1         3.5          1.4           2  setosa   1
2       1_2_3_4_setosa          1.0         2.0          3.0           4  setosa 151
>
> # Construct sql query for ms sql ===========================================================================
> 
> # construct columns for query - produces 1 string
> cols <- paste0('(',paste0(colnames(New.Data), collapse=', '),')')
> 
> # construct values for query - produce 1 string
> vals <- paste0(
+     apply(New.Data,1,function(x) paste0("('", paste0(x, collapse = "', '"), "')")), collapse = ", ")
> 
> # construct update values for query
> insertVals <- paste0('(',paste0('s.',colnames(New.Data), collapse=', '),')')
> 
> # construct update set for query
> updateSet <- paste0(colnames(New.Data%>%select(-ID)),
+                     ' = s.',colnames(New.Data%>%select(-ID)), collapse=', ')
> 
> # construct upsert query (does not currently work!)
> queryNew.Data <- paste0('MERGE iris AS t ',
+                             'USING (VALUES ',vals,') AS s',cols,
+                             ' ON t.ID = s.ID ',
+                             'WHEN MATCHED THEN ',
+                             'UPDATE SET ',updateSet,
+                             ' WHEN NOT MATCHED THEN ',
+                             'INSERT',cols,
+                             ' VALUES',insertVals,';')
> 
> queryNew.Data <- gsub("\\b'\\b","",queryNew.Data) 
> queryNew.Data <- gsub("'NA'", 'NULL', queryNew.Data)
> 
> # send the query to the database (no lucky currently)
> DBI::dbGetQuery(con, queryNew.Data)

Worked it out, code works fine. Just need to fix the naming convention of columns. SQL does not handle '.' within names, so change to '_' and code will work as desired.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM