简体   繁体   中英

What's the I/O operations count on a Bulk insert? (on MongoDB)

I am wondering how following queries varies in terms of I/O per second.

Bulk


var bulk = db.items.initializeUnorderedBulkOp();
bulk.insert( { _id: 1, item: "First Item", value: 100 } );
bulk.insert( { _id: 2, item: "Second Item", value: 300 } );
bulk.insert( { _id: 3, item: "Third Item", value: 0 } );
bulk.execute();

Normal

db.items.insert( { _id: 1, item: "First Item", value: 100 } );
db.items.insert( { _id: 2, item: "Second Item", value: 150 } );
db.items.insert( { _id: 3, item: "Third Item", value: 0 } );

Basically, what I want to know is if a bulk insert makes less I/O operations than a normal insert?

IO in terms of bytes written to disk would be the same since it needs to write the same amount of data in either case. The main difference would be in terms of the amount of back-and-forth between client and server.

For example, I wrote a small script to insert 50,000 documents using both method and check the time it took while monitoring mongostat .

For individual inserts:

start time: 1566954225.15597
end time: 1566954259.938584
elapsed: 34.78261399269104

mongostat output:

insert query update delete getmore command dirty used flushes vsize   res qrw arw net_in net_out conn     set repl                time
    *0    *0     *0     *0       0     0|0  1.0% 3.7%       0 5.33G 80.0M 0|0 1|0   262b   33.1k    2 replset  PRI Aug 28 11:03:44.432
   242    *0     *0     *0       0     3|0  1.0% 3.7%       0 5.33G 80.0M 0|0 1|0  77.2k   90.3k    4 replset  PRI Aug 28 11:03:45.438
  1442    *0     *0     *0       0     2|0  1.1% 3.8%       0 5.33G 81.0M 0|1 1|0   453k    365k    4 replset  PRI Aug 28 11:03:46.431
  1559    *0     *0     *0       0     0|0  1.2% 3.9%       0 5.33G 82.0M 0|0 1|1   490k    392k    4 replset  PRI Aug 28 11:03:47.431
  1228    *0     *0     *0       0     1|0  1.3% 4.0%       0 5.33G 82.0M 0|0 1|0   385k    315k    4 replset  PRI Aug 28 11:03:48.430
  1442    *0     *0     *0       0     0|0  1.3% 4.1%       0 5.33G 83.0M 0|0 1|0   454k    365k    4 replset  PRI Aug 28 11:03:49.433
... skipped 24 lines ...
  1464    *0     *0     *0       0     1|0  1.6% 5.9%       0 5.35G 99.0M 0|0 1|0   460k    370k    4 replset  PRI Aug 28 11:04:14.429
  1492    *0     *0     *0       0     0|0  1.7% 6.0%       0 5.35G 99.0M 0|0 1|1   469k    376k    4 replset  PRI Aug 28 11:04:15.430
  1519    *0     *0     *0       0     1|0  1.8% 6.1%       0 5.35G  100M 0|1 1|0   478k    383k    4 replset  PRI Aug 28 11:04:16.434
  1475    *0     *0     *0       0     1|0  1.9% 6.2%       0 5.35G  100M 0|0 1|1   464k    373k    4 replset  PRI Aug 28 11:04:17.433
  1210    *0     *0     *0       0     1|0  2.0% 6.2%       0 5.35G  101M 0|0 1|1   380k    312k    4 replset  PRI Aug 28 11:04:18.432
  1318    *0     *0     *0       0     0|0  2.1% 6.3%       0 5.35G  101M 0|1 1|0   414k    336k    4 replset  PRI Aug 28 11:04:19.437
   752    *0     *0     *0       0     1|0  2.1% 6.4%       0 5.35G  102M 0|0 1|0   236k    206k    2 replset  PRI Aug 28 11:04:20.435
    *0    *0     *0     *0       0     1|0  2.1% 6.4%       0 5.35G  102M 0|0 1|0   320b   33.8k    2 replset  PRI Aug 28 11:04:21.436

It took 34 seconds to insert all 50,000 documents. Note that each second it inserts ~1400 documents. This may be the limitation of my laptop, however.

For bulk insert:

start time: 1566954287.503233
end time: 1566954288.55518
elapsed: 1.0519471168518066

mongostat output:

insert query update delete getmore command dirty used flushes vsize  res qrw arw net_in net_out conn     set repl                time
    *0    *0     *0     *0       0     1|0  4.1% 8.6%       0 5.38G 143M 0|0 1|0   264b   33.3k    2 replset  PRI Aug 28 11:04:47.308
 36157    *0     *0     *0       0     4|0  6.3% 10.9%       0 5.41G 164M 0|0 1|1  1.73m   35.0k    4 replset  PRI Aug 28 11:04:48.319
 13556    *0     *0     *0       0     1|0  4.6% 11.3%       0 5.42G 180M 0|0 1|0   264b   33.6k    2 replset  PRI Aug 28 11:04:49.311
    *0    *0     *0     *0       0     1|0  4.6% 11.3%       0 5.42G 180M 0|0 1|0   263b   33.1k    2 replset  PRI Aug 28 11:04:50.311

It took 1 second to insert all 50,000 documents. Note that in that one second, it inserts pretty much all of them at once.

Thus single inserts is much slower since you have a lot of overhead in calling the server, sending the data, and receive the confirmation. In terms of network IO, bulk insert involves much less overhead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM