ArangoDB：通過示例插入查詢功能

Question

我的部分圖表是使用兩個大型集合之間的巨型連接構建的，每次我將文檔添加到任一集合時都會運行它。 該查詢基於較舊的帖子。

FOR fromItem IN fromCollection
    FOR toItem IN toCollection
        FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
        INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection

這需要大約55,000秒才能完成我的數據集。 我絕對歡迎提出更快的建議。

但我有兩個相關的問題：

我需要一個upsert。 通常情況下， upsert會很好，但在這種情況下，由於我無法預先知道密鑰，所以對我沒用。 為了獲得前面的密鑰，我需要通過示例查詢以找到其他相同的現有邊的密鑰。 這似乎是合理的，只要它不會破壞我的性能，但我不知道如何在AQL中有條件地構造我的查詢，以便在等效邊緣尚不存在的情況下插入邊緣，但如果等效邊緣則不執行任何操作確實存在。 我怎樣才能做到這一點？
每次將數據添加到任一集合時，我都需要運行它。 我需要一種方法只在最新的數據上運行它，這樣它就不會嘗試加入整個集合。 如何編寫允許我只加入新插入記錄的AQL？ 它們與Arangoimp一起添加，我無法保證它們的更新順序，因此我無法在創建節點的同時創建邊緣。 我如何只加入新數據？ 每次添加記錄時我都不想花費55k秒。

Answer 1

如果您在沒有任何索引的情況下運行查詢，則必須執行兩次嵌套的完整集合掃描，這可以通過查看輸出來看出

db._explain(<your query here>);

它顯示如下：

  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      3     - FOR fromItem IN fromCollection   /* full collection scan */
  3   EnumerateCollectionNode      9       - FOR toItem IN toCollection   /* full collection scan */
  4   CalculationNode              9         - LET #3 = (fromItem.`fromAttributeValue` == toItem.`toAttributeValue`)   /* simple expression */   /* collections used: fromItem : fromCollection, toItem : toCollection */
  5   FilterNode                   9         - FILTER #3
  ...

如果你這樣做

db.toCollection.ensureIndex({"type":"hash", fields ["toAttributeValue"], unique:false})`

這時會出現在一個單一的全表掃描收集fromCollection ，並對每個項目中有一個哈希查找在toCollection ，這會快很多。 一切都會分批進行，所以這應該已經改善了局面。 db._explain()將顯示：

  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      3     - FOR fromItem IN fromCollection   /* full collection scan */
  8   IndexNode                    3       - FOR toItem IN toCollection   /* hash index scan */

要僅處理fromCollection最近插入的項目相對簡單：只需將導入時間的時間戳添加到所有頂點，並使用：

FOR fromItem IN fromCollection
    FILTER fromItem.timeStamp > @lastRun
    FOR toItem IN toCollection
        FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
        INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection

當然放在一個skiplist指數timeStamp在屬性fromCollection 。

這應該可以很好地發現fromCollection新頂點。 它會“忽略”，在新的頂點toCollection鏈接到舊頂點fromCollection 。

您可以通過在查詢中交換fromCollection和toCollection的角色來發現這些（不要忘記fromAttributeValue中fromCollection的索引）並記住如果from頂點是舊的，則只放入邊緣，如：

FOR toItem IN toCollection
    FILTER toItem.timeStamp > @lastRun
    FOR fromItem IN fromCollection
        FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
        FILTER fromItem.timeStamp <= @lastRun 
        INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection

這兩個一起應該做你想要的。 請在這里找到完整的例子。

ArangoDB：通過示例插入查詢功能

問題描述

1 個解決方案

解決方案1
8 2016-10-20 08:47:32

ArangoDB：通過示例插入查詢功能

問題描述

1 個解決方案

解決方案1 8 2016-10-20 08:47:32

解決方案1
8 2016-10-20 08:47:32