簡體   English   中英

如何在ArangoDB 2.7中提高檢索查詢性能

[英]How to improve the retrieve Query performance in ArangoDB 2.7

我是python和ArangoDB的初學者。 我在單個集合名稱“DSP”上使用ArangoDB中的數據。 我的查詢是:

for k in 
    (for t in DSP return [t.data])
        for z in k
           for p in z
              filter p.name == "name" || 
                     p.content == "pdf" ||
                     p.content == "xml" ||
                     p.name == "Book"
              return p

和已存儲的json數據:in以類似的格式

{"data": [{"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1}, {"content": "File folder", "type": "string", "name": "type", "key": 1}, {"content": 1896038645, "type": "int", "name": "size", "key": 1}, {"content": 7, "type": "string", "name": "child_folder_count", "key": 1}, {"content": 7, "type": "string", "name": "child_file_count", "key": 1}, {"content": "parse_dir.py", "type": "string", "name": "name", "key": 101}, {"content": "D:/Java/parse_dir.py", "type": "string", "name": "location", "key": 101}, {"content": "py", "type": "string", "name": "mime-type", "key": 101}, {"content": 4032, "type": "string", "name": "size", "key": 101}, {"content": "Wed Dec 30 21:36:32 2015", "type": "string", "name": "created_date", "key": 101}, {"content": "Wed Dec 30 21:42:38 2015", "type": "string", "name": "modified_date", "key": 101}, {"content": "result.json", "type": "string", "name": "name", "key": 102}, {"content": "D:/Java/result.json", "type": "string", "name": "location", "key": 102}, {"content": "json", "type": "string", "name": "mime-type", "key": 102}, {"content": 1134450, "type": "string", "name": "size", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "created_date", "key": 102}, {"content": "Wed Dec 30 21:36:45 2015", "type": "string", "name": "modified_date", "key": 102}, {"content": "rmi1.rar", "type": "string", "name": "name", "key": 103}, {"content": "D:/Java/rmi1.rar", "type": "string", "name": "location", "key": 103}, {"content": "rar", "type": "string", "name": "mime-type", "key": 103}, {"content": 165116, "type": "string", "name": "size", "key": 103}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 103}, {"content": "Tue Aug 30 16:18:34 2011", "type": "string", "name": "modified_date", "key": 103}, {"content": "servlet.rar", "type": "string", "name": "name", "key": 104}, {"content": "D:/Java/servlet.rar", "type": "string", "name": "location", "key": 104}, {"content": "rar", "type": "string", "name": "mime-type", "key": 104}, {"content": 782, "type": "string", "name": "size", "key": 104}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 104}, {"content": "Tue Aug 30 16:18:30 2011", "type": "string", "name": "modified_date", "key": 104}, {"content": "crawler projects", "type": "string", "name": "name", "key": 2}, {"content": "D:/Java/crawler projects", "type": "string", "name": "location", "key": 2}, {"content": "File folder", "type": "string", "name": "type", "key": 2}, {"content": 1886842316, "type": "int", "name": "size", "key": 2}, {"content": 5, "type": "string", "name": "child_folder_count", "key": 2}, {"content": 5, "type": "string", "name": "child_file_count", "key": 2}, {"content": ".metadata", "type": "string", "name": "name", "key": 3}, {"content": "D:/Java/crawler projects/.metadata", "type": "string", "name": "location", "key": 3}, {"content": "File folder", "type": "string", "name": "type", "key": 3}, {"content": 10131546, "type": "int", "name": "size", "key": 3}, {"content": 2, "type": "string", "name": "child_folder_count", "key": 3}, {"content": 2, "type": "string", "name": "child_file_count", "key": 3}, {"content": ".lock", "type": "string", "name": "name", "key": 301}, {"content": "D:/Java/crawler projects/.metadata/.lock", "type": "string", "name": "location", "key": 301}, {"content": "", "type": "string", "name": "mime-type", "key": 301}, {"content": 0, "type": "string", "name": "size", "key": 301}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 301}, {"content": "Mon May 30 12:21:45 2011", "type": "string", "name": "modified_date", "key": 301}, {"content": ".log", "type": "string", "name": "name", "key": 302}, {"content": "D:/Java/crawler projects/.metadata/.log", "type": "string", "name": "location", "key": 302}, {"content": "", "type": "string", "name": "mime-type", "key": 302}, {"content": 598, "type": "string", "name": "size", "key": 302}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 302}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 302}, {"content": "version.ini", "type": "string", "name": "name", "key": 303}, {"content": "D:/Java/crawler projects/.metadata/version.ini", "type": "string", "name": "location", "key": 303}, {"content": "ini", "type": "string", "name": "mime-type", "key": 303}, {"content": 26, "type": "string", "name": "size", "key": 303}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 303}, {"content": "Mon May 30 15:29:18 2011", "type": "string", "name": "modified_date", "key": 303}, {"content": ".mylyn", "type": "string", "name": "name", "key": 4}, {"content": "D:/Java/crawler projects/.metadata/.mylyn", "type": "string", "name": "location", "key": 4}, {"content": "File folder", "type": "string", "name": "type", "key": 4}, {"content": 920, "type": "int", "name": "size", "key": 4}, {"content": 1, "type": "string", "name": "child_folder_count", "key": 4}, {"content": 1, "type": "string", "name": "child_file_count", "key": 4}, {"content": ".tasks.xml.zip", "type": "string", "name": "name", "key": 401}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/.tasks.xml.zip", "type": "string", "name": "location", "key": 401}, {"content": "zip", "type": "string", "name": "mime-type", "key": 401}, {"content": 250, "type": "string", "name": "size", "key": 401}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 401}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 401}, {"content": "repositories.xml.zip", "type": "string", "name": "name", "key": 402}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/repositories.xml.zip", "type": "string", "name": "location", "key": 402}, {"content": "zip", "type": "string", "name": "mime-type", "key": 402}, {"content": 420, "type": "string", "name": "size", "key": 402}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 402}, {"content": "Mon May 30 12:23:18 2011", "type": "string", "name": "modified_date", "key": 402}, {"content": "tasks.xml.zip", "type": "string", "name": "name", "key": 403}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/tasks.xml.zip", "type": "string", "name": "location", "key": 403}, {"content": "zip", "type": "string", "name": "mime-type", "key": 403}, {"content": 250, "type": "string", "name": "size", "key": 403}, {"content": "Sun Aug 25 07:29:52 2013", "type": "string", "name": "created_date", "key": 403}, {"content": "Mon May 30 15:31:16 2011", "type": "string", "name": "modified_date", "key": 403}, {"content": "contexts", "type": "string", "name": "name", "key": 5}, {"content": "D:/Java/crawler projects/.metadata/.mylyn/contexts", "type": "string", "name": "location", "key": 5}, {"content": "File folder", "type": "string", "name": "type", "key": 5}, {"content": 0, "type": "int", "name": "size", "key": 5}, {"content": 0, "type": "string", "name": "child_folder_count", "key": 5}]

因為我正在添加json文檔大約100個json文檔,每個大約15 MB,或者添加更多n個過濾條件。 查詢需要1分鍾以上的時間,有時瀏覽器沒有響應。

我在英特爾酷睿i3 2.4 GHz,4 GB內存和160 GB SATA硬盤上進行了這項實驗。

請告訴我,首先,如何提高查詢的性能? 我是否需要更改存儲結構或更改查詢的語法。 以及如何對具有相同鍵的多個文檔執行連接操作,例如,“檢索xml類型的文檔名稱”。

應該有幾種方法來改善此查詢的性能:

  • 通過子查詢從集合DSP選擇所有文檔,然后對它們進行迭代( for k in (for t in DSP return [t.data]) for z in k for p in z filter p.name == "name" ... )可能比直接使用文檔效率低。 嘗試用FOR k IN DSP FOR p IN k.data FILTER p.name == "name" ...替換4 FOR循環和子查詢FOR k IN DSP FOR p IN k.data FILTER p.name == "name" ...

  • 如果查看查詢的explain輸出,它將顯示將不使用索引。 如果集合中有大量文檔,並且只想通過查詢檢索其中的一些文檔,那么索引將有助於提高性能。 我建議在data[*].name上使用數組索引,在data[*].content 您可以像這樣設置它們: db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].name" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].content" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].name" ] }); db.DSP.ensureIndex({ type: "hash", fields: [ "data[*].content" ] }); 注意:這些類型的索引需要ArangoDB 2.8。 使用這些索引,查詢也可以簡化為: FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... FOR p in DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content... 請注意,索引只能幫助您快速查找包含搜索數據的文檔,但不能幫助您快速查找包含搜索數據的文檔部分。

  • 調整文檔結構可能會有所幫助。 您當前的結構似乎包含每個文檔的多個contentname值,例如[ {"content": "Java", "type": "string", "name": "name", "key": 1}, {"content": "D:/Java", "type": "string", "name": "location", "key": 1} ] 看起來每個文檔只有一個data屬性,這是一個數組這些結構。 您可以嘗試將每個數組值保存為單獨的文檔,而不是使用此結構。 例如, {"content": "Java", "type": "string", "name": "name", "key": 1}將成為自己的文檔, {"content": "D:/Java", "type": "string", "name": "location", "key": 1}將成為另一個文檔等。這似乎是明智的,因為你的子結構似乎已經有一個key屬性和幾個數組值似乎指的是相同的key 轉換將允許將可能非常大的文檔拆分成更小的塊,這不僅會使AQL運行得更快(因為它在訪問文檔時需要解包少得多的數據),但也可以讓你擺脫所有嵌套循環,並在返回結果時定位到相關的內部數組值。

如果您調整文檔結構,您的查詢可以大大簡化為FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p FOR p IN DSP FILTER "name" IN p.data[*].name || "Book" IN p.data[*].name || "pdf" IN p.data[*].content ... RETURN p ,如果使用索引,應該很快。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM