I have a question similar to this Stackoverflow post .
How can I create a persistent table from a SQL query in a database (I use a DB2 database)? My goal is to use a table from one schema and to permanently create a more or less modified table in another schema.
What works so far is to pull the data to R
and subsequently create a table in a different schema:
dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
collect() %>%
DBI::dbWriteTable(con, Id(schema = "SCHEMA_B", table = "NEW_TABLE"), ., overwrite = TRUE)
However, I'd like to incorporate the compute()
function in a dplyr
pipeline such that I do not have to pull the data into R, that is, I'd like keep the data on the database. As a side note: I do not know how I would substitute the DBI
's dbWriteTable()
for dplyr
's copy_to()
– being able to do that would also help me.
Unfortunately, I am not able to make it work, even after reading ?compute()
and its online documentation . The following code framework does not work and results in an error:
dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
dplyr::compute(in_schema("SCHEMA_B", "NEW_TABLE"), analyze = FALSE, temporary = FALSE)
Is there a solution for using compute()
or some other solution applicable to a dplyr pipeline?
I use a custom function that takes the SQL query behind a remote table, converts in into a query that can be executed on the SQL server to save a new table, and then executes that query using the DBI package. Key details below, full details (and other functions I find useful) in my GitHub repository here .
write_to_database <- function(input_tbl, db_connection, db, schema, tbl_name){
# SQL query
sql_query <- glue::glue("SELECT *\n",
"INTO {db}.{schema}.{tbl_name}\n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS from_table")
# run query
DBI::dbExecute(db_connection, as.character(sql_query))
}
The essence of the idea is to construct an SQL query that if you executed it in your database language directly, would give you the desired outcome. In my application this takes the form:
SELECT *
INTO db.schema.table
FROM (
/* sub query for existing table */
) AS alias
Note that this is using SQL server, and your particular SQL syntax might be different. INTO
is the SQL server pattern for writing a table. In the example linked to in the question, the syntax is TO TABLE
.
Thanks to @Simon.SA, I could solve my problem. As he showed in his reply, one can define a custom function and incorporate it in a dplyr
pipeline. My adapted code looks like this:
# Custom function
write_to_database <- function(input_tbl, db_connection, schema, tbl_name){
# SQL query
sql_query <- glue::glue("CREATE TABLE {schema}.{tbl_name} AS (\n",
"SELECT * FROM (\n",
dbplyr::sql_render(input_tbl),
"\n)) WITH DATA;")
# Drop table if it exists
DBI::dbExecute(con, glue::glue("BEGIN\n",
"IF EXISTS\n",
"(SELECT TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA = '{schema}' AND TABNAME = '{tbl_name}') THEN\n",
"PREPARE stmt FROM 'DROP TABLE {schema}.{tbl_name}';\n",
"EXECUTE stmt;\n",
"END IF;\n",
"END"))
# Run query
DBI::dbExecute(db_connection, as.character(sql_query))
}
# Dplyr pipeline
dplyr::tbl(con, in_schema("SCHEMA_A", "SOURCE_TABLE_NAME")) %>%
dplyr::filter(VARIABLE == "ABC") %>%
show_query() %>%
write_to_database(., con, "SCHEMA_B", "NEW_TABLE_NAME")
It turns out that DB2 appears to not know DROP TABLE IF EXISTS
such that some additional programming is necessary. I used this Stackoverflow post to get it done. Furthermore, in my case, I do not need to specify the database explicitly such that the parameter db
in the custom function is left out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.