简体   繁体   中英

Bulk inserts and INSERT INTO IGNORE with CrateDB

I'd like to insert huge amounts of data, What should I use: Single insert into statements, or do I have to use bulk inserts? Is there something else? The reason I ask is, that my CrateDB node's disk is only busy at 11kb/s on average while the disk load is at 100% using single inserts!

Furthermore, is something like INSERT INTO IGNORE supported? Can I just throw my data in bulk at CrateDB and it will ignore duplicate entries?

Thanks!

So as you rightly guessed, bulk inserts give you the best performance. However the experience might vary - which mostly depends on the chosen "bulk size", ie how many records are sent at once. Usually a batch of 1000 records performs very well, but it's recommended to play around a bit since this might be specific to the hardware CrateDB runs on.

Bulk inserts will also skip duplicate inserts automatically - if you have a primary key defined on that table (how else would the DB know what's a duplicate?). This comes at a performance impact (needless lookup/failed insert) though...

Depending on what you want to achieve, you should consider using insert or update

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM