简体   繁体   English

获取 Azure DocumentDb 中的记录数

[英]Get record count in Azure DocumentDb

It seems like 'select count(*) from c' in the SQL queries allowed by documentdb in the azure site and through the documentdb explorer ( https://studiodocumentdb.codeplex.com/ ) is not supported.似乎不支持 azure 站点中的 documentdb 以及通过 documentdb 资源管理器 ( https://studiodocumentdb.codeplex.com/ ) 允许的 SQL 查询中的“select count(*) from c”。 To date, the only way to get a record count that I have found is from code (see below).迄今为止,我发现获得记录计数的唯一方法是通过代码(见下文)。 However, there are enough files in our collection now that this is crashing.但是,现在我们的收藏中有足够的文件,因为它正在崩溃。 Is there a way to get a count on how many documents in a collection that works more than my solution?有没有办法计算集合中有多少文档比我的解决方案更有效?

DocumentClient dc = GetDocumentDbClient();
var databaseCount = dc.CreateDatabaseQuery().ToList();
Database azureDb = dc.CreateDatabaseQuery().Where(d => d.Id == Constants.WEATHER_UPDATES_DB_NAME).ToArray().FirstOrDefault();

var collectionCount = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).ToList();

DocumentCollection update = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).Where(c => c.Id == "WeatherUpdates").ToArray().FirstOrDefault();

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

MessageBox.Show("Databases: " + databaseCount.Count().ToString() + Environment.NewLine
                +"Collections: " + collectionCount.Count().ToString() + Environment.NewLine
                + "Documents: " + documentCount.Count().ToString() + Environment.NewLine, 
                 "Totals", MessageBoxButtons.OKCancel); 

这在我们主的 2017 年成为可能。

SELECT VALUE COUNT(1) FROM c

[ 1234 ]

实际上在这一点上有效:

SELECT COUNT(c.id) FROM c

Until the implementation of the "count" keyword, you should do your query in a store procedure on the server.在执行“count”关键字之前,您应该在服务器上的存储过程中进行查询。 Take care to not get all columns/properties in your query if you want only a count.如果您只想要一个计数,请注意不要获取查询中的所有列/属性。

Select only the id like;仅选择 id 之类的;

  dc.CreateDocumentQuery(update.SelfLink, "SELECT c.id FROM c")

This is possible in the same way you write SQL query now,这与您现在编写 SQL 查询的方式相同,

SELECT VALUE COUNT(1) FROM myCollection

在此处输入图片说明

NOTE: COUNT(1) won't work for a huge datasets.注意: COUNT(1) 不适用于大型数据集。

You can read more about supported queries from here您可以从此处阅读有关受支持查询的更多信息

Just to recap - here is example of Count Stored Procedure via JS with continuation support.回顾一下 - 这里是通过 JS进行计数存储过程的示例, 具有延续支持。

And here is one more tool for DocumentDb that's pretty neat: https://github.com/mingaliu/DocumentDBStudio/releases这是 DocumentDb 的另一个工具,非常简洁: https : //github.com/mingaliu/DocumentDBStudio/releases

Upd Mar 2017 : In the latest DDB SDK see DDB Aggregates press release there is full support for basic aggregates, without GROUP BY though (for now). 20173 月更新:在最新的 DDB SDK 中,请参阅DDB 聚合新闻稿,完全支持基本聚合,尽管(目前)没有 GROUP BY。 Here is GIT REpo with examples: https://github.com/arramac/azure-documentdb-dotnet/tree/master/samples/code-samples/Queries这是带有示例的 GIT REpo: https : //github.com/arramac/azure-documentdb-dotnet/tree/master/samples/code-samples/Queries

I did a test against a partitioned Document db collection with 200K entities in a single partition.我对单个分区中包含 20 万个实体的分区文档数据库集合进行了测试。 The Collection is configured with 10K RU/second.集合配置为 10K RU/秒。

Client side queries:客户端查询:

  1. "SELECT VALUE COUNT(1) FROM c"

Time elapsed (ms): 2471 milliseconds Total Request Units consumed: 6143.35已用时间(毫秒):2471 毫秒消耗的总请求单位:6143.35

Note: This is the fastest and cheapest option.注意:这是最快和最便宜的选择。 But keep in mind that you would need to handle continuation on the client side and execute next query using the returned continuation token otherwise you may get partial result/count.但请记住,您需要在客户端处理延续并使用返回的延续令牌执行下一个查询,否则您可能会获得部分结果/计数。

  1. "SELECT COUNT(c.id) FROM c"

Time elapsed (ms): 2589 Total RU: 6682.43已用时间(毫秒):2589 总 RU:6682.43

Note: This is very close but slightly slower and more expensive.注意:这非常接近,但速度稍慢且价格更高。

Server side / Stored Procedure:服务器端/存储过程:

  1. If you need a stored proc, there is one provided here: https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/Count.js如果你需要一个存储过程,这里提供了一个: https : //github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/Count.js

But beware it is problematic.. It internally reads all documents in the collection / partition just to calculate the count.但要注意它是有问题的。它在内部读取集合/分区中的所有文档只是为了计算计数。 As a result it is much slower and a lot more expensive!结果,它慢得多,而且贵得多!

Time elapsed (ms): 8584 milliseconds Total RU: 13419.31已用时间(毫秒):8584 毫秒总 RU:13419.31

  1. I updated the stored procedure provided in above link to improve the performance.我更新了上面链接中提供的存储过程以提高性能。 Full Updated Count.js below.下面是完整更新的 Count.js。 The updated stored proc performs way faster and cheaper than the original and it is on par with the best performing client side query (#1 above):更新后的存储过程比原始过程执行得更快、成本更低,并且与性能最佳的客户端查询(上面的 #1)相当:

Time elapsed (ms): 2534 milliseconds Total RU: 6298.36已用时间(毫秒):2534 毫秒总 RU:6298.36

function count(filterQuery, continuationToken) {
    var collection = getContext().getCollection();
    var maxResult = 500000; 
    var result = 0;

    var q = 'SELECT \'\' FROM root';
    if (!filterQuery) {
        filterQuery = q;
    }

    tryQuery(continuationToken);

    function tryQuery(nextContinuationToken) {
        var responseOptions = { continuation: nextContinuationToken, pageSize: maxResult };

        if (result >= maxResult || !query(responseOptions)) {
            setBody(nextContinuationToken);
        }
    }

    function query(responseOptions) {
        return (filterQuery && filterQuery.length) ?
            collection.queryDocuments(collection.getSelfLink(), filterQuery, responseOptions, onReadDocuments) :
            collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);
    }

    function onReadDocuments(err, docFeed, responseOptions) {
        if (err) {
            throw 'Error while reading document: ' + err;
        }

        result += docFeed.length;

        if (responseOptions.continuation) {
            tryQuery(responseOptions.continuation);
        } else {
            setBody(null);
        }
    }

    function setBody(continuationToken) {
        var body = { count: result, continuationToken: continuationToken };
        getContext().getResponse().setBody(body);
    }
}

Currently does not exist.目前不存在。 I had a similar scenario and we ended up adding a counter to a document attribute that gets updated every time a document gets added or deleted.我有一个类似的场景,我们最终向文档属性添加了一个计数器,每次添加或删除文档时都会更新该计数器。 You could even make these two steps as part of a store procedure or a trigger if you want atomicity.如果您想要原子性,您甚至可以将这两个步骤作为存储过程或触发器的一部分。

My code count solution is also working...once I just selected the id as a Papa Ours pointed out :) To get my original post to work, replace this line:我的代码计数解决方案也有效......一旦我选择了 id 作为我们爸爸指出的 :) 要让我的原始帖子起作用,请替换以下行:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

with this line:用这一行:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT id FROM c").ToList()

I still like the idea of the stored procedure as it will work in the documentdb studio (really cool project :)) - https://studiodocumentdb.codeplex.com/我仍然喜欢存储过程的想法,因为它可以在 documentdb 工作室中工作(非常酷的项目:)) - https://studiodocumentdb.codeplex.com/

To find the count of records of Id = IdValue:要查找 Id = IdValue 的记录数:

SELECT COUNT(1) FROM c where c.Id = <IdValue>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM