简体   繁体   English

ArangoDB:数组元素中的性能指标

[英]ArangoDB: performance index in array element

I have a Collection in ArangoDB populated with element like this:我在 ArangoDB 中有一个集合,其中填充了这样的元素:

{

  "id": "XXXXXXXX",
  "relation": [
    {
      "AAAAA": "AAAAA",
    },
    {
      "BBBB": "BBBBBB",
      "field": {
        "v1": 0,
        "v2": 0,
        "v3": 0
      }
    },
    {
      "CCCC": "CCCC",
      "field": {
        "v1": 0,
        "v2": 1,
        "v3": 2
      }
    },
  ]
}

I want to return only elements that have field.v1 > 0 (or a combination of v values).我只想返回field.v1 > 0 (或 v 值的组合)的元素。 I've tried to write an AQL query like this one, but it doesn't use indexes and it is so slow with 200000+ elements.我曾尝试编写一个像这样的 AQL 查询,但它不使用索引,而且使用 200000+ 个元素时速度太慢。

FOR a in X
    FILTER LENGTH(a.relation) > 0
    LET relation =  a.relation
    FOR r in relation
        FILTER r.field > null 
        FILTER r.field.v1 > 0
return a

I've tried to create these indexes:我试图创建这些索引:

  • full text on relation[*]field
  • skip list on relation[*]field
  • hash on relation[*]field but with no result. hash on relation[*]field但没有结果。

What can I do?我能做什么? Can you suggest me any changes to the query?你能建议我对查询进行任何更改吗?

Thanks.谢谢。

Best regards,此致,

Daniele丹尼尔

I suggest the following changes, but they won't speed up the query noticeably:我建议进行以下更改,但它们不会显着加快查询速度:

  • the filters FILTER r.field > null and FILTER r.field.v1 > 0 are redundant.过滤器FILTER r.field > nullFILTER r.field.v1 > 0是多余的。 You can just use the latter FILTER r.field.v1 > 0 and omit the other filter condition您可以只使用后者FILTER r.field.v1 > 0并省略其他过滤条件

  • the auxiliary variable LET relation = a.relation is defined after a.relation is used in the LENGTH(a.relation) calculation.辅助变量LET relation = a.relation之后被定义a.relation在使用LENGTH(a.relation)计算。 If the auxiliary variable would be defined before the LENGTH() calculation, it could be used inside it like this: LET relation = a.relation FILTER LENGTH(relation) > 0 .如果辅助变量是在LENGTH()计算之前定义的,它可以像这样在其中使用: LET relation = a.relation FILTER LENGTH(relation) > 0 This will save a bit of processing time这将节省一些处理时间

  • the original query checks each v1 value and may return each document multiple times if multiple v1 values in a document satisfy the filter condition.原始查询检查每个v1值,如果文档中的多个v1值满足过滤条件,则可能多次返回每个文档。 That means the original query may return more documents than there are actually present in the collection.这意味着原始查询可能返回比集合中实际存在的文档更多的文档。 If that's not desired, I suggest using a subquery (see below)如果不需要,我建议使用子查询(见下文)

When applying the above modifications to the original query, this is what I came up with:将上述修改应用于原始查询时,这就是我想出的:

FOR a IN X 
  LET relation = a.relation
  FILTER LENGTH(relation) > 0 
  LET s = (
    FOR r IN relation
      FILTER r.field.v1 > 0 
      LIMIT 1 
      RETURN 1
  )
  FILTER LENGTH(s) > 0 
  RETURN a

As I said this probably won't improve performance greatly, however, you may get a different (potentially the desired) result from the query, ie less documents if multiple v1 in a document satisfy the filter condition.正如我所说,这可能不会大大提高性能,但是,您可能会从查询中得到不同的(可能是所需的)结果,即如果文档中的多个v1满足过滤条件,则文档会减少。

Regarding indexes: fulltext and hash indexes will not help here as they support only equality comparisons, but the query's filter conditions is a greater than .关于索引:全文和哈希索引在这里没有帮助,因为它们只支持相等比较,但查询的过滤条件大于. The only index type that could be beneficial here in general would be the skiplist index.一般而言,唯一在这里有益的索引类型是跳过列表索引。 However, indexing array values is not supported in 2.7 at all, so indexing relation[*].field won't help and still no index will be used as you reported.但是,在 2.7 中根本不支持索引数组值,因此索引relation[*].field将无济于事,并且仍然不会像您报告的那样使用索引。

ArangoDB 2.8 will be the first version that supports indexing individual array values, and there you could create an index on relation[*].field.v1 . ArangoDB 2.8 将是第一个支持索引单个数组值的版本,您可以在该版本上创建一个关于relation[*].field.v1

Still the query in 2.8 won't use that index because the array indexes are only used for the IN comparison operator. 2.8 中的查询仍然不会使用该索引,因为数组索引仅用于IN比较运算符。 They cannot be used with a > as in the query.它们不能与查询中的>一起使用。 Additionally, when writing the filter condition as FILTER r[*].field.v1 > 0 , this would evaluate to FILTER [null, 0, 0] > 0 for the example document above, which will not produce the desired results.此外,当将过滤条件编写为FILTER r[*].field.v1 > 0 ,对于上面的示例文档,这将评估为FILTER [null, 0, 0] > 0 ,这不会产生所需的结果。

What could help here is a comparison operator modificator (working title) that could tell the operators < , <= , > , >= , == , != to run the comparison on all members of its left operand.在这里可以提供帮助的是比较运算符修饰符(工作标题),它可以告诉运算符<<=>>===!=对其左操作数的所有成员运行比较。 There could be ALL and ANY modifications, so that the filter condition could be written as simply FILTER a.relation[*].field.v1 ANY > 0 .可以有ALLANY修改,因此过滤条件可以简单地写为FILTER a.relation[*].field.v1 ANY > 0 But please note that this is not an existing feature yet, but only my quick draft for how this could be fixed in the future.但请注意,这还不是现有功能,而只是我关于将来如何解决此问题的快速草稿。

Fulltext indes currently can only be used with the FULLTEXT() function .全文索引目前只能与FULLTEXT() 函数一起使用

Its currently not possible to use indices for determining the length of sub objects.目前无法使用索引来确定子对象的长度。 This would be somthing one could solve using function defined indices once they would become real.一旦它们成为现实,这将是使用函数定义的索引可以解决的事情。

Right now the only way to get a useable performance for this would be to to remeber that length on another attribute while writing the documents into the collection:现在获得可用性能的唯一方法是在将文档写入集合时记住另一个属性的长度:

{
  "id": "XXXXXXXX",
  "length": 6,
  "relation": [
    {
      "AAAAA": "AAAAA",
    },
    {
      "BBBB": "BBBBBB",
      "field": {
        "v1": 0,
        "v2": 0,
        "v3": 0
      }
    },
    {
      "CCCC": "CCCC",
      "field": {
        "v1": 0,
        "v2": 1,
        "v3": 2
      }
    },
  ]
}

<Clippy> you look like you want to be using graph features for your data layout? <Clippy>你看起来想在你的数据布局中使用图形功能? </Clippy> </Clippy>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM