简体   繁体   English

arangodb从提交日志中获取文档更新日期

[英]arangodb get document update date from commit log

Is it possible to obtain a records updated data from Arangodb commit logs if there is such a thing as commit logs. 如果存在诸如提交日志之类的事情,是否可以从Arangodb提交日志中获取记录更新的数据。 We have a couple of documents which where update but we did update their modified date field. 我们有一些文档,其中有更新,但我们确实更新了其修改日期字段。 We however would like to retrieve all updated/changed documents since a certain date. 但是,我们希望检索自某个日期以来的所有更新/更改的文档。

There are two solutions for this: 有两种解决方案:

Solution one : 解决方法一

The first solution is to not use the commit log, but run an AQL query on the collection and filter on the modified date field. 第一种解决方案是不使用提交日志,而是在集合上运行AQL查询,并在modified日期字段上进行过滤。 This will be efficient if there is a sorted index (ie skiplist index) on the modified field. 如果modified字段上有排序的索引(即,跳过列表索引),这将非常有效。

An example setup for this can be found in the following setup script, which populates a collection test with 50K documents with random modification dates: 可以在以下安装脚本中找到此设置的示例,该脚本使用具有随机修改日期的50K文档填充收集test

/* use some fixed base date to make query produce results */
var baseDate = 1478779081650; /* 2016-11-10T11:58:01.650Z */
db._create("test");
db.test.ensureIndex({ type: "skiplist", fields: [ "modified" ]});

/* create 50,000 documents with modified dates between
   2016-11-10T11:58:01.650Z and up to two years in the past */
for (var i = 0; i < 50000; ++i) {
  db.test.insert({ value: i, modified: new Date(baseDate - Math.floor(Math.random() * 1000 * 60 * 60 * 24 * 365 * 2)).toISOString() });
}

Then using AQL it's straight-forward to find documents with a modified date higher than a specific value: 然后使用AQL直接查找modified日期高于特定值的文档:

var query = "FOR doc IN test FILTER doc.modified >= @date RETURN doc"; 
/* find all documents modified since 2016-11-09T12:00:00.000Z */
var docs = db._query(query, { date: "2016-11-09T12:00:00.000Z" }).toArray();
require("internal").print(docs);

It's also possible to do queries on date ranges, eg 也可以对日期范围进行查询,例如

var query = "FOR doc IN test FILTER doc.modified >= @from && doc.modified <= @to RETURN doc"; 
var docs = db._query(query, { from: "2016-11-09T00:00:00.000Z", to: from: "2016-11-09T23:59:59.999Z"  }).toArray();
require("internal").print(docs);

Solution two : 解决方法二

The second solution is to use the WAL change log that ArangoDB also exposes via its HTTP API. 第二种解决方案是使用ArangoDB也通过其HTTP API公开的WAL更改日志。 But this is much more complicated and requires keeping state on the client side. 但这要复杂得多,并且需要在客户端保持状态。

The basic idea is to query the WAL change log API at /_api/replication/logger-follow for the given collection. 基本思想是在/_api/replication/logger-follow中查询给定集合的WAL更改日志API。 This API call can be given an initial tick value. 可以给此API调用一个初始滴答值。 This controls from where in the change log the request will start looking. 这可以控制请求从更改日志中的何处开始查找。 In the beginning this tick value is unclear, so simply omit it. 在开始时,此滴答值尚不清楚,因此只需将其忽略即可。 Using curl, the call for the collection test would be: 使用curl,收集test的调用将是:

curl -X GET "http://127.0.0.1:8529/_db/_system/_api/replication/logger-follow?collection=test" --basic --user "root:" --dump -

All call to this API will produce some HTTP headers with state information and the WAL entries for the collection in chronological order, eg 所有对该API的调用都会产生一些HTTP标头,其中包含状态信息以及按时间顺序的集合的WAL条目,例如

...
X-Arango-Replication-Checkmore: true
X-Arango-Replication-Lastincluded: 6103060
X-Arango-Replication-Lasttick: 6251758
...
{"tick":"6101295","type":2000,"database":"1","cid":"6101294","cname":"test","data":"cid":"6101294","deleted":false,"doCompact":true,"indexBuckets":8,"isSystem":false,"isVolatile":false,"maximalSize":33554432,"name":"test","type":2,"version":5,"waitForSync":false}}
{"tick":"6101298","type":2100,"database":"1","cid":"6101294","cname":"test","data":{"fields":["modified"],"id":"6101297","sparse":false,"type":"skiplist","unique":false}}
{"tick":"6101302","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101300","_key":"6101300","_rev":"6101300","modified":"2015-06-26T14:18:30.732Z","value":0}}
{"tick":"6101305","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101304","_key":"6101304","_rev":"6101304","modified":"2016-11-09T07:14:08.146Z","value":1}}
{"tick":"6101308","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101307","_key":"6101307","_rev":"6101307","modified":"2015-05-14T04:45:01.202Z","value":2}}
...

As can be seen the change log contains not only the insert/update operations for the documents but also the creation of the collection and the creation of the index. 可以看出,更改日志不仅包含文档的插入/更新操作,还包含集合的创建和索引的创建。 It will also contain all remove operations and other operations that change the meta-data of the collection. 它还将包含所有删除操作和其他更改集合元数据的操作。

Using the change log results, you can now filter them on the client side for type 2300, which is a document insert or update operation, and then peek into data . 使用更改日志结果,您现在可以在客户端对2300 type的文件进行过滤,这是一个文档插入或更新操作,然后可以查看data modified of each returned document. modified了每个返回的文档。 You can then use the documents which satisfy your search condition. 然后,您可以使用满足搜索条件的文档。

Note that the result of the request may not contain all operations, but it may contain only a fraction of them. 请注意,请求的结果可能不包含所有操作,但可能只包含其中的一部分。 It may be necessary to fetch more data from the server. 可能有必要从服务器获取更多数据。 This can be done by calling the API again, now using the value of the X-Arango-Replication-Lastincluded HTTP response header as tick value, eg 这可以通过再次调用API来完成,现在使用X-Arango-Replication-Lastincluded HTTP响应标头的值作为tick值,例如

curl -X GET "http://127.0.0.1:8529/_db/_system/_api/replication/logger-follow?collection=test&from=6103060" --basic --user "root:" --dump -

This will produce even more operations. 这将产生更多的操作。 You can call the API again and again until it produces no more results and the value of the X-Arango-Replication-Checkmore HTTP response header becomes false . 您可以一次又一次调用该API,直到不再产生结果并且X-Arango-Replication-Checkmore HTTP响应标头的值变为false X-Arango-Replication-Checkmore That means you have fetched all operations for the time being. 这意味着您暂时已经获取了所有操作。

This solution requires the client to potentially issue multiple HTTP requests and keep state (the last fetched tick value), so it's not as easy to use as the AQL-based solution. 此解决方案要求客户端潜在地发出多个HTTP请求并保持状态(最后获取的tick值),因此它不像基于AQL的解决方案那样容易使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM