简体   繁体   English

哪个 DBI function 用于 `create table 之类的语句<tabx>作为 select * 来自<taby> ` 在 R 中?</taby></tabx>

[英]Which DBI function for statements like `create table <tabX> as select * from <tabY>` in R?

I am using DBI / ROracle .我正在使用DBI / ROracle

drv <- dbDriver("Oracle")
conn <- dbConnect(drv, ...)

I need to create a table from a select query in another table (ie a statement like create table <tabX> as select * from <tabY> ).我需要从另一个表中的 select 查询创建一个表(即像create table <tabX> as select * from <tabY>这样的语句)。

There seems to be several functions that can perform this task, eg:似乎有几个函数可以执行此任务,例如:

dbSendQuery(conn, "create table tab1 as select * from bigtable")
# Statement:            create table tab1 as select * from bigtable 
# Rows affected:        28196 
# Row count:            0 
# Select statement:     FALSE 
# Statement completed:  TRUE 
# OCI prefetch:         FALSE 
# Bulk read:            1000 
# Bulk write:           1000 

Or:或者:

dbExecute(conn, "create table tab2 as select * from bigtable")
# [1] 28196

Or even:甚至:

tab3 <- dbGetQuery(conn, "select * from bigtable")
dbWriteTable(conn = conn, "TAB3", tab3)
# [1] TRUE

Each method seems to work but I guess there is differences in performance/best pratice.每种方法似乎都有效,但我想在性能/最佳实践方面存在差异。 What is the best/most efficient way to run statements like create table <tabX> as select * from <tabY> ?运行create table <tabX> as select * from <tabY>等语句的最佳/最有效方法是什么?

I did not find any hint in the DBI and ROracle help pages.我没有在 DBI 和 ROracle 帮助页面中找到任何提示。

Up front: use dbExecute for this;预先:为此使用dbExecute don't use dbSendQuery , that function suggests the expectation of returned data (though still works).不要使用dbSendQuery ,那 function 建议返回数据的期望(尽管仍然有效)。

dbSendQuery should only be used when you expect data in return; dbSendQuery只应在您期望返回数据时使用; most connections will do just fine even if you mis-use it, but that's the design of it.即使您使用不当,大多数连接也能正常工作,但这就是它的设计。 Instead, use dbSendStatement / dbClearResult or better yet just dbExecute .相反,使用dbSendStatement / dbClearResult或更好但只是dbExecute

The following are pairs of perfectly-equivalent pathways:以下是一对完全等效的路径:

  • To retrieve data:检索数据:
    • dat <- dbGetQuery(con, qry)
    • res <- dbSendQuery(con, qry); dat <- dbFetch(res); dbClearResult(res)
  • To send a statement (that does not return data, eg UPDATE or INSERT ):发送语句(不返回数据,例如UPDATEINSERT ):
    • dbExecute(con, stmt)
    • res <- dbSendStatement(con, stmt); dbClearResult(res)
    • (sloppy) res <- dbSendQuery(con, stmt); dbClearResult(res) (马虎) res <- dbSendQuery(con, stmt); dbClearResult(res) res <- dbSendQuery(con, stmt); dbClearResult(res) (I think some DBs complain about this method) res <- dbSendQuery(con, stmt); dbClearResult(res) (我想有些数据库会抱怨这个方法)

If you choose dbSend* , one should always call dbClearResult when done with the statement/fetch.如果您选择dbSend* ,则在完成语句/提取时应始终调用dbClearResult (R will often clean up after you, but if something goes wrong here -- and I have hit this a few times over the last few years -- the connection locks up and you must recreate it. This can leave orphan connections on the database as well.) (R 通常会在你之后清理,但如果这里出现问题——我在过去几年中遇到过几次——连接会锁定,你必须重新创建它。这可能会在数据库中留下孤立连接也一样。)

I think most use-cases are a single-query-and-out, meaning dbGetQuery and dbExecute are the easiest to use.我认为大多数用例都是单一查询和输出,这意味着dbGetQuerydbExecute是最容易使用的。 However, there are times when you may want to repeat a query.但是,有时您可能想要重复查询。 An example from ?dbSendQuery :来自?dbSendQuery的示例:

     # Pass multiple sets of values with dbBind():
     rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = ?")
     dbBind(rs, list(6L))
     dbFetch(rs)
     dbBind(rs, list(8L))
     dbFetch(rs)
     dbClearResult(rs)

(I think it's a little hasty in that documentation to dbFetch without capturing the data... I would expect dat <- dbFetch(..) , discarding the return value here seems counter-productive.) (我认为在dbFetch的文档中没有捕获数据有点仓促......我希望dat <- dbFetch(..) ,在这里丢弃返回值似乎适得其反。)

One advantage to doing this multi-step (requiring dbClearResult ) is with more complex queries: database servers in general tend to "compile" or optimize a query based on its execution engine.执行此多步骤(需要dbClearResult )的一个优点是查询更复杂:数据库服务器通常倾向于根据其执行引擎“编译”或优化查询。 This is not always a very expensive step for the server to execute, and it can pay huge dividends on data retrieval.对于服务器来说,这并不总是一个非常昂贵的步骤来执行,而且它可以为数据检索带来巨大的好处。 The server often caches this optimized query, and when it sees the same query it uses the already-optimized version of the query.服务器经常缓存这个优化的查询,当它看到相同的查询时,它使用查询的已经优化的版本。 This is one case where using parameter-binding can really help, as the query is identical in repeated use and therefore never needs to be re-optimized.这是使用参数绑定真正有用的一种情况,因为查询在重复使用时是相同的,因此永远不需要重新优化。

FYI, parameter-binding can be done repeatedly as shown above using dbBind , or it can be done using dbGetQuery using the params= argument.仅供参考,参数绑定可以如上所示使用dbBind重复完成,也可以使用params=参数使用dbGetQuery完成。 For instance, this equivalent set of expressions will return the same results as above:例如,这组等效的表达式将返回与上面相同的结果:

qry <- "SELECT * FROM mtcars WHERE cyl = ?"
dat6 <- dbGetQuery(con, qry, params = list(6L))
dat8 <- dbGetQuery(con, qry, params = list(8L))

As for dbWriteTable , for me it's mostly a matter of convenience for quick work.至于dbWriteTable ,对我来说主要是为了方便快速工作。 There are times when the DBI/ODBC connection uses the wrong datatype on the server (eg, SQL Server's DATETIME instead of DATETIMEOFFSET ; or NVARCHAR(32) versus varchar(max) ), so if I need something quickly, I'll use dbWriteTable , otherwise I formally define the table with the server datatypes that I know I want, as in dbExecute(con, "create table quux (...)") .有时 DBI/ODBC 连接在服务器上使用错误的数据类型(例如,SQL 服务器的DATETIME而不是DATETIMEOFFSET ;或NVARCHAR(32)varchar(max) ),所以如果我需要快速的东西,我会使用dbWriteTable ,否则我会使用我知道我想要的服务器数据类型正式定义表,如dbExecute(con, "create table quux (...)") This is by far not a "best practice", it is heavily rooted in preference and convenience.到目前为止,这不是“最佳实践”,它很大程度上植根于偏好和便利性。 For data that is easy (float/integer/string) and the server default datatypes are acceptable, dbWriteTable is perfectly fine.对于简单的数据(浮点数/整数/字符串)并且服务器默认数据类型是可以接受的, dbWriteTable非常适合。 One can also use dbCreateTable (which creates it without uploading data), which allows you to specify the fields with a bit more control.还可以使用dbCreateTable (创建它而不上传数据),它允许您指定具有更多控制权的字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM