[英]Which DBI function for statements like `create table <tabX> as select * from <tabY>` in R?
I am using DBI
/ ROracle
.我正在使用
DBI
/ ROracle
。
drv <- dbDriver("Oracle")
conn <- dbConnect(drv, ...)
I need to create a table from a select query in another table (ie a statement like create table <tabX> as select * from <tabY>
).我需要从另一个表中的 select 查询创建一个表(即像
create table <tabX> as select * from <tabY>
这样的语句)。
There seems to be several functions that can perform this task, eg:似乎有几个函数可以执行此任务,例如:
dbSendQuery(conn, "create table tab1 as select * from bigtable")
# Statement: create table tab1 as select * from bigtable
# Rows affected: 28196
# Row count: 0
# Select statement: FALSE
# Statement completed: TRUE
# OCI prefetch: FALSE
# Bulk read: 1000
# Bulk write: 1000
Or:或者:
dbExecute(conn, "create table tab2 as select * from bigtable")
# [1] 28196
Or even:甚至:
tab3 <- dbGetQuery(conn, "select * from bigtable")
dbWriteTable(conn = conn, "TAB3", tab3)
# [1] TRUE
Each method seems to work but I guess there is differences in performance/best pratice.每种方法似乎都有效,但我想在性能/最佳实践方面存在差异。 What is the best/most efficient way to run statements like
create table <tabX> as select * from <tabY>
?运行
create table <tabX> as select * from <tabY>
等语句的最佳/最有效方法是什么?
I did not find any hint in the DBI and ROracle help pages.我没有在 DBI 和 ROracle 帮助页面中找到任何提示。
Up front: use dbExecute
for this;预先:为此使用
dbExecute
; don't use dbSendQuery
, that function suggests the expectation of returned data (though still works).不要使用
dbSendQuery
,那 function 建议返回数据的期望(尽管仍然有效)。
dbSendQuery
should only be used when you expect data in return; dbSendQuery
只应在您期望返回数据时使用; most connections will do just fine even if you mis-use it, but that's the design of it.即使您使用不当,大多数连接也能正常工作,但这就是它的设计。 Instead, use
dbSendStatement
/ dbClearResult
or better yet just dbExecute
.相反,使用
dbSendStatement
/ dbClearResult
或更好但只是dbExecute
。
The following are pairs of perfectly-equivalent pathways:以下是一对完全等效的路径:
dat <- dbGetQuery(con, qry)
res <- dbSendQuery(con, qry); dat <- dbFetch(res); dbClearResult(res)
UPDATE
or INSERT
):UPDATE
或INSERT
):
dbExecute(con, stmt)
res <- dbSendStatement(con, stmt); dbClearResult(res)
res <- dbSendQuery(con, stmt); dbClearResult(res)
res <- dbSendQuery(con, stmt); dbClearResult(res)
res <- dbSendQuery(con, stmt); dbClearResult(res)
(I think some DBs complain about this method) res <- dbSendQuery(con, stmt); dbClearResult(res)
(我想有些数据库会抱怨这个方法) If you choose dbSend*
, one should always call dbClearResult
when done with the statement/fetch.如果您选择
dbSend*
,则在完成语句/提取时应始终调用dbClearResult
。 (R will often clean up after you, but if something goes wrong here -- and I have hit this a few times over the last few years -- the connection locks up and you must recreate it. This can leave orphan connections on the database as well.) (R 通常会在你之后清理,但如果这里出现问题——我在过去几年中遇到过几次——连接会锁定,你必须重新创建它。这可能会在数据库中留下孤立连接也一样。)
I think most use-cases are a single-query-and-out, meaning dbGetQuery
and dbExecute
are the easiest to use.我认为大多数用例都是单一查询和输出,这意味着
dbGetQuery
和dbExecute
是最容易使用的。 However, there are times when you may want to repeat a query.但是,有时您可能想要重复查询。 An example from
?dbSendQuery
:来自
?dbSendQuery
的示例:
# Pass multiple sets of values with dbBind():
rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = ?")
dbBind(rs, list(6L))
dbFetch(rs)
dbBind(rs, list(8L))
dbFetch(rs)
dbClearResult(rs)
(I think it's a little hasty in that documentation to dbFetch
without capturing the data... I would expect dat <- dbFetch(..)
, discarding the return value here seems counter-productive.) (我认为在
dbFetch
的文档中没有捕获数据有点仓促......我希望dat <- dbFetch(..)
,在这里丢弃返回值似乎适得其反。)
One advantage to doing this multi-step (requiring dbClearResult
) is with more complex queries: database servers in general tend to "compile" or optimize a query based on its execution engine.执行此多步骤(需要
dbClearResult
)的一个优点是查询更复杂:数据库服务器通常倾向于根据其执行引擎“编译”或优化查询。 This is not always a very expensive step for the server to execute, and it can pay huge dividends on data retrieval.对于服务器来说,这并不总是一个非常昂贵的步骤来执行,而且它可以为数据检索带来巨大的好处。 The server often caches this optimized query, and when it sees the same query it uses the already-optimized version of the query.
服务器经常缓存这个优化的查询,当它看到相同的查询时,它使用查询的已经优化的版本。 This is one case where using parameter-binding can really help, as the query is identical in repeated use and therefore never needs to be re-optimized.
这是使用参数绑定真正有用的一种情况,因为查询在重复使用时是相同的,因此永远不需要重新优化。
FYI, parameter-binding can be done repeatedly as shown above using dbBind
, or it can be done using dbGetQuery
using the params=
argument.仅供参考,参数绑定可以如上所示使用
dbBind
重复完成,也可以使用params=
参数使用dbGetQuery
完成。 For instance, this equivalent set of expressions will return the same results as above:例如,这组等效的表达式将返回与上面相同的结果:
qry <- "SELECT * FROM mtcars WHERE cyl = ?"
dat6 <- dbGetQuery(con, qry, params = list(6L))
dat8 <- dbGetQuery(con, qry, params = list(8L))
As for dbWriteTable
, for me it's mostly a matter of convenience for quick work.至于
dbWriteTable
,对我来说主要是为了方便快速工作。 There are times when the DBI/ODBC connection uses the wrong datatype on the server (eg, SQL Server's DATETIME
instead of DATETIMEOFFSET
; or NVARCHAR(32)
versus varchar(max)
), so if I need something quickly, I'll use dbWriteTable
, otherwise I formally define the table with the server datatypes that I know I want, as in dbExecute(con, "create table quux (...)")
.有时 DBI/ODBC 连接在服务器上使用错误的数据类型(例如,SQL 服务器的
DATETIME
而不是DATETIMEOFFSET
;或NVARCHAR(32)
与varchar(max)
),所以如果我需要快速的东西,我会使用dbWriteTable
,否则我会使用我知道我想要的服务器数据类型正式定义表,如dbExecute(con, "create table quux (...)")
。 This is by far not a "best practice", it is heavily rooted in preference and convenience.到目前为止,这不是“最佳实践”,它很大程度上植根于偏好和便利性。 For data that is easy (float/integer/string) and the server default datatypes are acceptable,
dbWriteTable
is perfectly fine.对于简单的数据(浮点数/整数/字符串)并且服务器默认数据类型是可以接受的,
dbWriteTable
非常适合。 One can also use dbCreateTable
(which creates it without uploading data), which allows you to specify the fields with a bit more control.还可以使用
dbCreateTable
(创建它而不上传数据),它允许您指定具有更多控制权的字段。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.