简体   繁体   English

ArangoDB 中 UPDATE 的奇怪性能问题

[英]Strange performance problem with UPDATE in ArangoDB

I am creating a Node.js application that works with ArangoDB as data store.我正在创建一个使用 ArangoDB 作为数据存储的 Node.js 应用程序。 Basically, what I have as data structure is two tables, one for managing so-called instances , another one for entities .基本上,我拥有的数据结构是两个表,一个用于管理所谓的instances ,另一个用于entities What I do is the following:我做的是以下内容:

  • There is a document in the instances collection for every instance I have.我拥有的每个instanceinstances集合中都有一个文档。
  • Whenever I add an entity to the entities collection, I also want to keep track of the entities that belong to a specific instance.每当我向entities集合添加实体时,我还想跟踪属于特定实例的实体。
  • So, every instance document has an array field for entities , and I push the ID of the entity into that array.因此,每个instance文档都有一个用于entities的数组字段,我将实体的 ID 推送到该数组中。

The following code shows the general outline:下面的代码显示了大纲:

// Connect to ArangoDB.
db = new Database(...);
db.useBasicAuth(user, password);

// Use the database.
await db.createDatabase(database);
db.useDatabase(database);

// Create the instance collection.
instanceCollection = db.collection(`instances-${uuid()}`);
await instanceCollection.create();

// Create the entities collection.
entityCollection = db.collection(`entities-${uuid()}`);
await entityCollection.create();

// Setup an instance.
instance = {
  id: uuid(),
  entities: []
};

// Create the instance in the database.
await db.query(aql`
  INSERT ${instance} INTO ${instanceCollection}
`);

// Add lots of entities.
for (let i = 0; i < scale; i++) {
  // Setup an entity.
  const entity = {
    id: uuid()
  };

  // Update the instance.
  instance.entities.push(entity);

  // Insert the entity in the database.
  await db.query(aql`
    INSERT ${entity} INTO ${entityCollection}
  `);

  // Update the instance in the database.
  await db.query(aql`
    FOR i IN ${instanceCollection}
      FILTER i.id == ${instance.id}
      UPDATE i WITH ${instance} IN ${instanceCollection} OPTIONS { mergeObjects: false }
  `);
}

The problem now is that this becomes extremely slow the more entities I add.现在的问题是,我添加的实体越多,这会变得非常慢。 It basically has exponential growth, although I would have expected linear growth:它基本上呈指数增长,尽管我预计会呈线性增长:

Running benchmark 'add and update'
  100 Entities:   348.80ms [+0.00%]
 1000 Entities:  3113.55ms [-10.74%]
10000 Entities: 90180.18ms [+158.54%]

Adding an index has an effect, but does not change anything on the overall problem:添加索引会产生影响,但不会对整体问题产生任何影响:

Running benchmark 'add and update with index'
  100 Entities:   194.30ms [+0.00%]
 1000 Entities:  2090.15ms [+7.57%]
10000 Entities: 89673.52ms [+361.52%]

The problem can be tracked down to the UPDATE statement.问题可以追溯到UPDATE语句。 If you leave it out and only use the database's INSERT statement, things scale linearly.如果您忽略它而只使用数据库的INSERT语句,则事情会线性扩展。 So, something seems to be wrong with the update itself.因此,更新本身似乎有问题。 However, I don't understand where the problem is.但是,我不明白问题出在哪里。

This is what I would like to understand: Why does the UPDATE statement get dramatically slower over time?这就是我想理解的:为什么UPDATE语句随着时间的推移变得显着变慢? Am I using it wrong?我用错了吗? Is this a known problem in ArangoDB?这是 ArangoDB 中的已知问题吗? …? ……?

What I am not interested in is discussing this approach: Please take is as given.感兴趣的是讨论这种方法:请按照给定的方式进行。 Let's focus on the performance of the UPDATE statement.让我们关注UPDATE语句的性能。 Any ideas?有任何想法吗?

UPDATE更新

As asked for in the comments, here some information on the system setup:正如评论中所要求的,这里有一些关于系统设置的信息:

  • ArangoDB 3.4.6, 3.6.2.1, and 3.7.0-alpha.2 (all running in Docker, on macOS and Linux) ArangoDB 3.4.6、3.6.2.1 和 3.7.0-alpha.2(均在 Docker、macOS 和 Linux 上运行)
  • Single-server setup单服务器设置
  • ArangoJS 6.14.0 (we also had this with earlier versions, although I can't tell the exact version) ArangoJS 6.14.0(我们在早期版本中也有这个,虽然我不能说出确切的版本)

Finding the problem发现问题

Have you tried explaining or profiling the query?您是否尝试过解释或分析查询?

Arango's explan plan descriptions are excellent. Arango 的解释计划描述非常好。 You can access explain using the built-in Aardvark web admin interface, or using db._explain(query) .您可以使用内置的 Aardvark Web 管理界面或使用db._explain(query)访问explain Here's what yours looks like:这是你的样子:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  5   CalculationNode              1     - LET #5 = { "_key" : "123", "_id" : "collection/123", "_rev" : "_aQcjewq---", ...instance }   /* json expression */   /* const assignment */
  2   EnumerateCollectionNode      2     - FOR i IN collection   /* full collection scan, projections: `_key`, `id` */   FILTER (i.`id` == "1")   /* early pruning */
  6   UpdateNode                   0       - UPDATE i WITH #5 IN pickups 

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Selectivity   Fields       Ranges
  6   primary   primary   pickups      true     false       100.00 %   [ `_key` ]   i

The problem问题

The key part in the plan is - FOR i IN collection /* full collection scan Full collection scan will be ...slow.计划中的关键部分是- FOR i IN collection /* full collection scan完全集合扫描将......很慢。 It should grow linearly with the size of your collection.它应该随着您收藏的大小线性增长。 So with your for loop of scale iterations, this definitely means exponential growth with the size of the collection.因此, for scale迭代的for循环,这绝对意味着集合大小呈指数增长。

Solution解决方案

Indexing the id should help but I think it depends on how you created the index.索引id应该会有所帮助,但我认为这取决于您如何创建索引。

Using _key instead of index changes the plan to show primary使用_key而不是 index 更改计划以显示primary

- FOR i IN pickups   /* primary index scan, index only, projections: `_key` */    

This should be constant-time, so with your for loop of scale iterations, this should mean linear time.这应该是恒定时间,所以for scale迭代的for循环,这应该意味着线性时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM