简体   繁体   English

在CouchDb中创建新文档而不进行更新的可能的缺点是什么?

[英]What are the possible disadvantages of creating new doc instead of updating in CouchDb?

I have started using CouchDb in conjunction with PouchDb in my new project and am relatively new to it. 在我的新项目中,我已经开始将CouchDb与PouchDb结合使用,并且相对较新。 I had a basic doubt. 我有一个基本的疑问。

To update a doc, I need to have the _rev value, which means querying the database, eg as shown here 要更新文档,我需要具有_rev值,这意味着查询数据库,例如此处所示

// fetch mittens
db.get('mittens').then(function (doc) {
  // update their age
  doc.age = 4;
  // put them back
  return db.put(doc);
}).then(function () {
  // fetch mittens again
  return db.get('mittens');
}).then(function (doc) {
  console.log(doc);
});

After update, there are two revisions of the document present in the db. 更新后,数据库中存在文档的两个修订版本。 Older revisions of the document are purged only during the compaction process. 仅在压缩过程中才清除文档的旧版本。

If I add a timestamp to my doc id's, eg shashi@stackoverflow.com-user-1464772888286, then instead of having different revisions of the same document, there are different documents in my db. 如果我在我的文档ID上添加了时间戳,例如shashi@stackoverflow.com-user-1464772888286,那么我的数据库中没有相同的文档,而没有对同一文档进行不同的修订。

After adding a new document, I can delete the document with older timestamps. 添加新文档后,我可以删除带有较旧时间戳记的文档。 Thus, when querying I can query all_docs with 因此,查询时我可以用

startkey="shashi@stackoverflow.com-user-"&endkey="shashi@stackoverflow.com-user-\uffff"   

and take the latest doc based on timestamp. 并根据时间戳获取最新文档。 (In any case, since I am deleting older docs when creating a new one, this query gives back only one doc.) (无论如何,由于我在创建新文档时会删除较旧的文档,因此该查询仅返回一个文档。)

In the app I am building, a desktop app, docs will be created and modified only on the desktop and are synced to the server for wareshousing/reporting/analysis purposes. 在我正在构建的应用程序中,一个桌面应用程序将仅在桌面上创建和修改文档,并将文档同步到服务器以进行仓储/报告/分析。 Thus, the scenario that somebody else will modify a doc leading to conflicts is minimal. 因此,其他人将修改文档而导致冲突的情况很小。

Initially, I had gone with the approach of keeping the id same. 最初,我采用了保持ID不变的方法。 However, I encountered a weird error where pocuhdb threw an error when trying to update the doc, but different revisions with the identical data were being created on CouchDb, to which pouchDb is configured to sync. 但是,我遇到一个奇怪的错误,即在尝试更新文档时pocuhdb抛出了一个错误,但是在CouchDb上创建了具有相同数据的不同修订版,并将pouchDb配置为同步到该版本。 Since I was short of time and was building a Proof of Concept, I went ahead with the approach of timestamps in _id. 由于我时间不足,并且正在构建概念证明,因此我继续使用_id中的时间戳记方法。

However, now I am wondering what are the potential pitfalls of this approach? 但是,现在我想知道这种方法的潜在陷阱是什么? I have an instinct that there are, since nowhere have I seen anyone take this approach, but I am not quite sure what are they. 我的本能是,因为我从未见过有人采用这种方法,但是我不太确定它们是什么。

CouchDb is most efficient doing lookups by the main id. CouchDb是通过主ID查找的最有效方式。 To use the start and end key you will need to use a view meaning you will loose performance and deal with the somewhat more complex views. 要使用开始键和结束键,您将需要使用一个视图,这意味着您将失去性能并处理一些更为复杂的视图。

For example if you do a lot of inserts there will be a small delay in getting responses from the view while it updates. 例如,如果您执行大量插入操作,则更新视图时从视图中获取响应的延迟会很小。 It will also require more disk space. 它还将需要更多的磁盘空间。

I would also argue that the delete you will do will impact your performance much more than first retrieve the doc before updating. 我还认为,与先删除文档再进行更新相比,将要执行的删除操作对性能的影响要大得多。 Better to have the compaction deal with old versions during quiet time. 最好在安静的时候让压缩处理旧版本。

Finally, since you are always creating new docs I assume you will always have access to the full doc when you do the write (how would you otherwise be able to store without retrieving), maybe you could also store the "_rev" param (comes back when you do the PUT) and then use it for updates and not have to request the doc. 最后,由于您一直在创建新文档,因此我假设您在编写时将始终可以访问完整文档(否则将如何在不检索的情况下进行存储),也许您也可以存储“ _rev”参数(来当您执行PUT时返回),然后将其用于更新,而不必请求文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM