简体繁体 English

MongoDB 性能：单个集合与多个 collections 并发读/写

[英]MongoDB Performance: single collection vs multiple collections for concurrent read/writes

原文 2020-04-30 20:35:48 5 2 node.js/ mongodb/ concurrency/ locking/ database-performance

I'm utilizing a local database on my web server to sync certain data from external APIs.我正在使用 web 服务器上的本地数据库来同步来自外部 API 的某些数据。 The local database would be used to serve the web application.本地数据库将用于为 web 应用程序提供服务。 The data I'm syncing is different for each user who would be visiting the web app.对于将访问 web 应用程序的每个用户，我正在同步的数据都不同。 Since the sync job is periodically but continuously writing to the DB while users are accessing their data from the web page, I'm wondering what would give me the best performance here.由于同步作业是定期但不断地写入数据库，而用户正在从 web 页面访问他们的数据，所以我想知道这里的最佳性能是什么。

Since the sync job is continuously writing to the DB, I believe the collection is locked until it's done.由于同步作业不断写入数据库，我相信集合在完成之前会被锁定。 I'm thinking that having multiple collections would help here since the lock would be on a particular collection that is being written to rather than on a single collection every time.我认为拥有多个 collections 在这里会有所帮助，因为锁将位于正在写入的特定集合上，而不是每次都在单个集合上。

Is my thinking correct here?我的想法在这里正确吗？ I basically don't want reads to get throttled since the write operation is continuously locking up one collection.我基本上不希望读取受到限制，因为写入操作会不断锁定一个集合。

2 个解决方案

There is an extensive amount of information regarding lock granularity and locking in MongoDB in general here .在MongoDB中有大量关于锁粒度和锁的信息。

In general, writing to multiple collections, for a small to medium value of "multiple", and assuming all of the collections are created in advance, can be faster than using a single collection, at the cost of queries becoming awkward as well as potentially slow if you have to perform joins via the aggregation pipeline instead of performing a single collection/index scan, for example.一般来说，写入多个 collections，对于“多个”的中小值，并假设所有 collections 都是预先创建的，可以比使用单个集合更快，但代价是查询变得尴尬以及可能例如，如果您必须通过聚合管道执行连接而不是执行单个集合/索引扫描，则速度会很慢。

If you have so many collections that there are so many files open that either the DB or the OS starts evicting files out of their respective caches, performance will start dropping again.如果您有太多 collections 以至于打开的文件太多以至于数据库或操作系统开始将文件从各自的缓存中逐出，性能将再次开始下降。

Creating collections may also be relatively slow, so if this happens under load it may not be very good for performance.创建 collections 也可能相对较慢，因此如果在负载下发生这种情况，对性能可能不是很好。

Collection level locking was never a thing in MongoDB.在 MongoDB 中，集合级别锁定从来都不是问题。 Before the WiredTiger storage engine arrived with MongoDB 4.x there were plenty of occcasions when the whole database would lock.在 WiredTiger 存储引擎与 MongoDB 4.x 一起出现之前，有很多情况下整个数据库都会锁定。

Nowdays with WiredTiger writing with multiple threads and/or processes to a single collection is extremely efficient.现在，使用 WiredTiger 将多个线程和/或进程写入单个集合非常有效。 The right way to distribute a very heavy write load in MongoDB is to shard your collection.在MongoDB中分配非常重的写入负载的正确方法是对您的集合进行分片。

To test a sharded vs unsharded config you can easily spin up both configurations in parallel with MongoDB Atlas .要测试分片与非分片配置，您可以轻松地与MongoDB Atlas并行启动这两种配置。