简体   繁体   English

MonetDB.R批量插入

[英]MonetDB.R bulk insert

Is there a way to do a bulk insert using MonetDB.R (not via a for loop and dbSendUpdate)? 有没有一种方法可以使用MonetDB.R进行批量插入(而不是通过for循环和dbSendUpdate)?

Does dbWriteTable allow for updates (append=TRUE)? dbWriteTable是否允许更新(append = TRUE)?

About "INSERT INTO" the MonetDB documentation states: "The benefit is clear: this is very straightforward. However, this is a seriously inefficient way of doing things in MonetDB." 关于MonetDB文档的“插入”,MonetDB文档指出:“好处很明显:这非常简单。但是,这是在MonetDB中处理效率极低的方法。”

Thanks. 谢谢。

Hannes might have a smarter solution, but for the time being, this might help :) Hannes可能有一个更智能的解决方案,但是暂时这可能会有所帮助:)

# start with an example data set
nrow( mtcars )

# and a MonetDB.R connection
db

# here's how many records you'd have if you stack your example data three times
nrow( mtcars ) * 3

# write to three separate tables
dbWriteTable( db , 'mtcars1' , mtcars )
dbWriteTable( db , 'mtcars2' , mtcars )
dbWriteTable( db , 'mtcars3' , mtcars )

# stack them all
dbSendUpdate( db , "CREATE TABLE mtcars AS SELECT * FROM mtcars1 UNION ALL SELECT * FROM mtcars2 UNION ALL SELECT * FROM mtcars3 WITH DATA" )

# correct number of records
nrow( dbReadTable( db , 'mtcars' ) )

I'll consider it. 我会考虑的。 monetdb.read.csv does use COPY INTO, so you might get away with creating a temp. monetdb.read.csv确实使用COPY INTO,因此您可能无法创建临时文件。 CSV file. CSV档案。

I see what you mean although this doesn't change anything to the fact that dbWriteTable uses a for loop and "INSERT INTO" which can be rather slow. 我明白您的意思,尽管这并没有改变dbWriteTable使用for循环和“ INSERT INTO”的事实,这可能会很慢。 I may not have been very clear in my initial post. 我的最初帖子可能不太清楚。

As a workaround I guess "START TRANSACTION" and "COMMIT" with dbSendUpdate might work. 作为一种解决方法,我认为使用dbSendUpdate进行“ START TRANSACTION”和“ COMMIT”可能有效。

Ideally something like this would be great: 理想的情况是这样的:

"COPY INTO table FROM data.frame" “从data.frame复制到表中”

We just published version 0.9.4 of MonetDB.R on CRAN. 我们刚刚在CRAN上发布了MonetDB.R 0.9.4版。 The main change in this release are major improvements to the dbWriteTable method. 此版本中的主要更改是对dbWriteTable方法的重大改进。 By default, INSERTs are now chunked into 1000 rows per statement. 默认情况下,现在每个语句将INSERT分为1000行。 Also, if the database is running on the same machine as R, you may use the csvdump=T parameter. 另外,如果数据库与R在同一台计算机上运行,​​则可以使用csvdump = T参数。 This writes the data.frame to a local temporary CSV file, and uses an automatically generated COPY INTO statement to import. 这会将data.frame写入本地临时CSV文件,并使用自动生成的COPY INTO语句导入。 Both these methods obviously are designed to improve the speed with which dbWriteTable imports data. 显然,这两种方法都旨在提高dbWriteTable导入数据的速度。 Also, he append/overwrite parameter handling has been fixed. 此外,他附加/覆盖参数的处理已得到修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM