如何在ArangoDB 2.7中提高检索查询性能

Question

我是python和ArangoDB的初学者。 我在单个集合名称“DSP”上使用ArangoDB中的数据。 我的查询是：

for k in 
    (for t in DSP return [t.data])
        for z in k
           for p in z
              filter p.name == "name" || 
                     p.content == "pdf" ||
                     p.content == "xml" ||
                     p.name == "Book"
              return p

和已存储的json数据：in以类似的格式

{"data": [{"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1}, {"content": "File folder", "type": "string", "name": "type", "key": 1}, {"content": 1896038645, "type": "int", "name": "size", "key": 1}, {"content": 7, "type": "string", "name": "child_folder_count", "key": 1}, {"content": 7, "type": "string", "name": "child_file_count", "key": 1}, {"content": "parse_dir.py", "type": "string", "name": "name", "key": 101}, {"content": "D:/Java/parse_dir.py", "type": "string", "name": "location", "key": 101}, {"content": "py", "type": "string", "name": "mime-type", "key": 101}, {"content": 4032, "type": "string", "name": "size", "key": 101}, {"content": "Wed Dec 30 21:36:32 2015", "type": "string", "name": "created_date", "key": 101}, {"content": "Wed Dec 30 21:42:38 2015", "type": "string", "name": "modified_date", "key": 101}, {"content": "result.json", "type": "string", "name": "name", "key": 102}, {"content": "D:/Java/result.json", "type": "string", "name": "location", "key": 102}, {"content": "json", "type": "string", "name": "mime-type", "key": 102}, {"content": 1134450, "type": "string", "name": "size", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "created_date", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "modified_date", "key": 102}, {"content": "rmi1.rar", "type": "string", "name": "name", "key": 103}, {"content": "D:/Java/rmi1.rar", "type": "string", "name": "location", "key": 103}, {"content": "rar", "type": "string", "name": "mime-type", "key": 103}, {"content": 165116, "type": "string", "name": "size", "key": 103}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 103}, {"content": "Tue Aug 30 16:18:34 2011", "type": "string", "name": "modified_date", "key": 103}, {"content": "servlet.rar", "type": "string", "name": "name", "key": 104}, {"content": "D:/Java/servlet.rar", "type": "string", "name": "location", "key": 104}, {"content": "rar", "type": "string", "name": "mime-type", "key": 104}, {"content": 782, "type": "string", "name": "size", "key": 104}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 104}, {"content": "Tue Aug 30 16:18:30 2011", "type": "string", "name": "modified_date", "key": 104}, {"content": "crawler projects", "type": "string", "name": "name", "key": 2}, {"content": "D:/Java/crawler projects", "type": "string", "name": "location", "key": 2}, {"content": "File folder", "type": "string", "name": "type", "key": 2}, {"content": 1886842316, "type": "int", "name": "size", "key": 2}, {"content": 5, "type": "string", "name": "child_folder_count", "key": 2}, {"content": 5, "type": "string", "name": "child_file_count", "key": 2}, {"content": ".metadata", "type": "string", "name": "name", "key": 3}, {"content": "D:/Java/crawler projects/.metadata", "type": "string", "name": "location", "key": 3}, {"content": "File folder", "type": "string", "name": "type", "key": 3}, {"content": 10131546, "type": "int", "name": "size", "key": 3}, {"content": 2, "type": "string", "name": "child_folder_count", "key": 3}, {"content": 2, "type": "string", "name": "child_file_count", "key": 3}, {"content": ".lock", "type": "string", "name": "name", "key": 301}, {"content": "D:/Java/crawler projects/.metadata/.lock", "type": "string", "name": "location", "key": 301}, {"content": "", "type": "string", "name": "mime-type", "key": 301}, {"content": 0, "type": "string", "name": "size", "key": 301}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 301}, {"content": "Mon May 30 12:21:45 2011", "type": "string", "name": "modified_date", "key": 301}, {"content": ".log", "type": "string", "name": "name", "key": 302}, {"content": "D:/Java/crawler projects/.metadata/.log", "type": "string", "name": "location", "key": 302}, {"content": "", "type": "string", "name": "mime-type", "key": 302}, {"content": 598, "type": "string", "name": "size", "key": 302}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 302}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 302}, {"content": "version.ini", "type": "string", "name": "name", "key": 303}, {"content": "D:/Java/crawler projects/.metadata/version.ini", "type": "string", "name": "location", "key": 303}, {"content": "ini", "type": "string", "name": "mime-type", "key": 303}, {"content": 26, "type": "string", "name": "size", "key": 303}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 303}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 303}, {"content": ".mylyn", "type": "string", "name": "name", "key": 4}, {"content": "D:/Java/crawler projects/.metadata/.mylyn", "type": "string", "name": "location", "key": 4}, {"content": "File folder", "type": "string", "name": "type", "key": 4}, {"content": 920, "type": "int", "name": "size", "key": 4}, {"content": 1, "type": "string", "name": "child_folder_count", "key": 4}, {"content": 1, "type": "string", "name": "child_file_count", "key": 4}, {"content": ".tasks.xml.zip", "type": "string", "name": "name", "key": 401}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/.tasks.xml.zip", "type": "string", "name": "location", "key": 401}, {"content": "zip", "type": "string", "name": "mime-type", "key": 401}, {"content": 250, "type": "string", "name": "size", "key": 401}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 401}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 401}, {"content": "repositories.xml.zip", "type": "string", "name": "name", "key": 402}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/repositories.xml.zip", "type": "string", "name": "location", "key": 402}, {"content": "zip", "type": "string", "name": "mime-type", "key": 402}, {"content": 420, "type": "string", "name": "size", "key": 402}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 402}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 402}, {"content": "tasks.xml.zip", "type": "string", "name": "name", "key": 403}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/tasks.xml.zip", "type": "string", "name": "location", "key": 403}, {"content": "zip", "type": "string", "name": "mime-type", "key": 403}, {"content": 250, "type": "string", "name": "size", "key": 403}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 403}, {"content": "Mon May 30 15:31:16 2011", "type": "string", "name": "modified_date", "key": 403}, {"content": "contexts", "type": "string", "name": "name", "key": 5}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/contexts", "type": "string", "name": "location", "key": 5}, {"content": "File folder", "type": "string", "name": "type", "key": 5}, {"content": 0, "type": "int", "name": "size", "key": 5}, {"content": 0, "type": "string", "name": "child_folder_count", "key": 5}]

因为我正在添加json文档大约100个json文档，每个大约15 MB，或者添加更多n个过滤条件。 查询需要1分钟以上的时间，有时浏览器没有响应。

我在英特尔酷睿i3 2.4 GHz，4 GB内存和160 GB SATA硬盘上进行了这项实验。

请告诉我，首先，如何提高查询的性能？ 我是否需要更改存储结构或更改查询的语法。 以及如何对具有相同键的多个文档执行连接操作，例如，“检索xml类型的文档名称”。

Answer 1

应该有几种方法来改善此查询的性能：

通过子查询从集合DSP选择所有文档，然后对它们进行迭代（ for k in (for t in DSP return [t.data]) for z in k for p in z filter p.name == "name" ... ）可能比直接使用文档效率低。 尝试用FOR k IN DSP FOR p IN k.data FILTER p.name == "name" ...替换4 FOR循环和子查询FOR k IN DSP FOR p IN k.data FILTER p.name == "name" ... ）
如果查看查询的explain输出，它将显示将不使用索引。 如果集合中有大量文档，并且只想通过查询检索其中的一些文档，那么索引将有助于提高性能。 我建议在data[*].name上使用数组索引，在data[*].content 。 您可以像这样设置它们： db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].name" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].content" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].name" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].content" ] }); 。 注意：这些类型的索引需要ArangoDB 2.8。 使用这些索引，查询也可以简化为： FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... 请注意，索引只能帮助您快速查找包含搜索数据的文档，但不能帮助您快速查找包含搜索数据的文档部分。
调整文档结构可能会有所帮助。 您当前的结构似乎包含每个文档的多个content和name值，例如[ {"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1} ] 。 看起来每个文档只有一个data属性，这是一个数组这些结构。 您可以尝试将每个数组值保存为单独的文档，而不是使用此结构。 例如， {"content": "Java", "type": "string", "name": "name", "key": 1}将成为自己的文档， {"content": "D:/Java", "type": "string", "name": "location", "key": 1}将成为另一个文档等。这似乎是明智的，因为你的子结构似乎已经有一个key属性和几个数组值似乎指的是相同的key 。转换将允许将可能非常大的文档拆分成更小的块，这不仅会使AQL运行得更快（因为它在访问文档时需要解包少得多的数据），但也可以让你摆脱所有嵌套循环，并在返回结果时定位到相关的内部数组值。

如果您调整文档结构，您的查询可以大大简化为FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p ，如果使用索引，应该很快。

如何在ArangoDB 2.7中提高检索查询性能

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-01-04 09:42:51

如何在ArangoDB 2.7中提高检索查询性能

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-01-04 09:42:51

解决方案1
3 已采纳 2016-01-04 09:42:51