简体   繁体   English

有效地计算给定字段具有不同值的文档

[英]Efficiently count Documents with different values for a given field

I am trying to count the number of documents that are in each possible state in a particular Arango collection.我正在尝试计算特定 Arango 集合中每种可能状态的文档数量。

This should be possible in 1 pass over all of the documents using a bucket-sort like strategy where you iterate over all documents, if the value for the state hasn't been seen before, you add a counter with a value of 1 to a list.这应该可以使用类似桶排序的策略遍历所有文档,在该策略中迭代所有文档,如果之前没有看到状态的值,则将值为 1 的计数器添加到列表。 If you have seen that state before, you increment the counter.如果您以前见过该状态,则增加计数器。 Once you've reached the end, you'll have a counter for each possible state in the DB that indicates how many documents are currently stored with that state.到达末尾后,您将在数据库中为每个可能的状态创建一个计数器,指示当前以该状态存储的文档数量。

I can't seem to figure out how to write this type of logic in AQL to submit as a query.我似乎无法弄清楚如何在 AQL 中编写这种类型的逻辑以作为查询提交。 Current strategy is like this:目前的策略是这样的:

  1. Loop over all documents, filtering only docs of a particular state.循环遍历所有文档,仅过滤特定状态的文档。
  2. Loop over all documents, filtering only docs of a different particular state.循环遍历所有文档,仅过滤不同特定状态的文档。
  3. ... ...
  4. All states have been filtered.所有状态都已过滤。
  5. Return size of each set每组返回大小

This works, but I'm sure it's much slower than it should be.这有效,但我确信它比它应该的要慢得多。 This also means that if we add a new state, we have to update the query to loop over all docs an additional time, filtering based on the new state.这也意味着如果我们添加一个新状态,我们必须更新查询以额外循环所有文档,根据新状态进行过滤。 A bucket-sort like query would be quick, and would need no updating as new states are created as well.像查询一样的桶排序会很快,并且不需要随着新状态的创建而更新。

If these were the documents:如果这些是文件:

  • {A} {一种}
  • {B} {B}
  • {B} {B}
  • {C} {C}
  • {A} {一种}

Then I'd like the result to be { A:2, B:2, C:1 } Where A,B,&C are values for a particular field.然后我希望结果是 { A:2, B:2, C:1 } 其中 A,B,&C 是特定字段的值。 Current strategy filters like so当前策略过滤器像这样

LET docsA = (
    FOR doc in collection
        FILTER doc.state == A
        RETURN doc
)

Then manually construct the return object calling LENGTH on each list of docs然后在每个文档列表上手动构造调用 LENGTH 的返回对象

Any help or additional info would be greatly appreciated任何帮助或其他信息将不胜感激

What about using a COLLECT function?使用COLLECT函数怎么样? (see docs here ) (请参阅此处的文档)

FOR doc IN collection
    COLLECT s = doc.state WITH COUNT INTO c
    RETURN { state: s, count: c }

This would return something like:这将返回如下内容:

[
  { state: 'A', count: 23 },
  { state: 'B', count: 2 },
  { state: 'C', count: 45 }
]

Would that accomplish what you are after?这会完成你所追求的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM