Why is this ArangoDB query too slow?

Question

I am new ArangoDB user and I am using the following query

FOR i IN meteo
FILTER
    i.`POM` == "Maxial"
    && TO_NUMBER(i.`TMP`) < 4.2
    && DATE_TIMESTAMP(i.`DTM`) > DATE_TIMESTAMP("2014-12-10")
    && DATE_TIMESTAMP(i.`DTM`) < DATE_TIMESTAMP("2014-12-15")
RETURN
    i.`TMP`

on a 2 million document collection. It has an index on the three fields that are filtered. It takes aprox. 9 secs on the Web Interface.

Is it possible to run it faster?

Thank you

Hugo

Answer 1

I have no access to the underlying data and data distribution nor the exact index definitions, so I can only give rather general advice:

Use the explain() command in order to see if the query makes use of indexes, and if yes, which.
If explain() shows that no index is used, check if the attributes contained in the query's FILTER conditions are actually indexed. There is the db.<collection>.getIndexes() command to check which attributes are indexed.
If indexes are present but not used by query, the indexes may have the wrong type. For example, a hash index will only be used for equality comparisons (ie == ) but not for other comparison types ( < , <= , > , >= etc.). A hash index will only be used if all the indexed attributes are used in the query's FILTER conditions. A skiplist index will only be used if at least its first attribute is used is used in a FILTER condition. If further of the skiplist index attributes are specified in the query (from left-to-right), they may also be used and allow to filter more documents.
Only a single index will be picked when scanning a collection. Having multiple, separate indexes on "POM" , "TMP" , and "DTM" won't help this query because it will only use one of them per collection that it iterates over. Instead, I suggest trying to put multiple attributes into an index if the query could benefit from this.
The more selective an index is, the better. For example, an index on a single attribute may filter a lot of documents, but a combined index on multiple attributes may filter even more. For this particular query, a skiplist index on [ "POM", "DTM" ] may be the right choice (in combination with 6.)
The only attribute for which the optimizer may consider an index lookup in the given original query is the "POM" attribute. The reason is that the other attributes are used inside function calls (ie TO_NUMBER() , DATE_TIMESTAMP() ). In general, indexes will not be used for attributes which are used inside functions (eg for TO_NUMBER(i.tmp) < 4.2 no index will be used. Same for DATE_TIMESTAMP(i.DTM) > DATE_TIMESTAMP("2014-12-10") . Modifying the conditions so the indexed attributes are directly compared to some constant or a one-time calculated value can enable more candidate indexes. If possible, try to rewrite the conditions so that only the indexed attributes are present on the one side of the comparison. For this particular query, it would be better to use i.DTM > "2014-12-10" instead of DATE_TIMESTAMP(i.DTM) > DATE_TIMESTAMP("2014-12-10") .

Why is this ArangoDB query too slow?

Question

1 answers

solution1
3 ACCPTED 2015-01-15 23:06:47

Why is this ArangoDB query too slow?

Question

1 answers

solution1 3 ACCPTED 2015-01-15 23:06:47

solution1
3 ACCPTED 2015-01-15 23:06:47