简体   繁体   中英

Strange performance problem with UPDATE in ArangoDB

I am creating a Node.js application that works with ArangoDB as data store. Basically, what I have as data structure is two tables, one for managing so-called instances , another one for entities . What I do is the following:

  • There is a document in the instances collection for every instance I have.
  • Whenever I add an entity to the entities collection, I also want to keep track of the entities that belong to a specific instance.
  • So, every instance document has an array field for entities , and I push the ID of the entity into that array.

The following code shows the general outline:

// Connect to ArangoDB.
db = new Database(...);
db.useBasicAuth(user, password);

// Use the database.
await db.createDatabase(database);
db.useDatabase(database);

// Create the instance collection.
instanceCollection = db.collection(`instances-${uuid()}`);
await instanceCollection.create();

// Create the entities collection.
entityCollection = db.collection(`entities-${uuid()}`);
await entityCollection.create();

// Setup an instance.
instance = {
  id: uuid(),
  entities: []
};

// Create the instance in the database.
await db.query(aql`
  INSERT ${instance} INTO ${instanceCollection}
`);

// Add lots of entities.
for (let i = 0; i < scale; i++) {
  // Setup an entity.
  const entity = {
    id: uuid()
  };

  // Update the instance.
  instance.entities.push(entity);

  // Insert the entity in the database.
  await db.query(aql`
    INSERT ${entity} INTO ${entityCollection}
  `);

  // Update the instance in the database.
  await db.query(aql`
    FOR i IN ${instanceCollection}
      FILTER i.id == ${instance.id}
      UPDATE i WITH ${instance} IN ${instanceCollection} OPTIONS { mergeObjects: false }
  `);
}

The problem now is that this becomes extremely slow the more entities I add. It basically has exponential growth, although I would have expected linear growth:

Running benchmark 'add and update'
  100 Entities:   348.80ms [+0.00%]
 1000 Entities:  3113.55ms [-10.74%]
10000 Entities: 90180.18ms [+158.54%]

Adding an index has an effect, but does not change anything on the overall problem:

Running benchmark 'add and update with index'
  100 Entities:   194.30ms [+0.00%]
 1000 Entities:  2090.15ms [+7.57%]
10000 Entities: 89673.52ms [+361.52%]

The problem can be tracked down to the UPDATE statement. If you leave it out and only use the database's INSERT statement, things scale linearly. So, something seems to be wrong with the update itself. However, I don't understand where the problem is.

This is what I would like to understand: Why does the UPDATE statement get dramatically slower over time? Am I using it wrong? Is this a known problem in ArangoDB? …?

What I am not interested in is discussing this approach: Please take is as given. Let's focus on the performance of the UPDATE statement. Any ideas?

UPDATE

As asked for in the comments, here some information on the system setup:

  • ArangoDB 3.4.6, 3.6.2.1, and 3.7.0-alpha.2 (all running in Docker, on macOS and Linux)
  • Single-server setup
  • ArangoJS 6.14.0 (we also had this with earlier versions, although I can't tell the exact version)

Finding the problem

Have you tried explaining or profiling the query?

Arango's explan plan descriptions are excellent. You can access explain using the built-in Aardvark web admin interface, or using db._explain(query) . Here's what yours looks like:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  5   CalculationNode              1     - LET #5 = { "_key" : "123", "_id" : "collection/123", "_rev" : "_aQcjewq---", ...instance }   /* json expression */   /* const assignment */
  2   EnumerateCollectionNode      2     - FOR i IN collection   /* full collection scan, projections: `_key`, `id` */   FILTER (i.`id` == "1")   /* early pruning */
  6   UpdateNode                   0       - UPDATE i WITH #5 IN pickups 

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Selectivity   Fields       Ranges
  6   primary   primary   pickups      true     false       100.00 %   [ `_key` ]   i

The problem

The key part in the plan is - FOR i IN collection /* full collection scan Full collection scan will be ...slow. It should grow linearly with the size of your collection. So with your for loop of scale iterations, this definitely means exponential growth with the size of the collection.

Solution

Indexing the id should help but I think it depends on how you created the index.

Using _key instead of index changes the plan to show primary

- FOR i IN pickups   /* primary index scan, index only, projections: `_key` */    

This should be constant-time, so with your for loop of scale iterations, this should mean linear time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM