I am creating a Node.js application that works with ArangoDB as data store. Basically, what I have as data structure is two tables, one for managing so-called instances
, another one for entities
. What I do is the following:
instances
collection for every instance
I have.entities
collection, I also want to keep track of the entities that belong to a specific instance.instance
document has an array field for entities
, and I push the ID of the entity into that array.The following code shows the general outline:
// Connect to ArangoDB.
db = new Database(...);
db.useBasicAuth(user, password);
// Use the database.
await db.createDatabase(database);
db.useDatabase(database);
// Create the instance collection.
instanceCollection = db.collection(`instances-${uuid()}`);
await instanceCollection.create();
// Create the entities collection.
entityCollection = db.collection(`entities-${uuid()}`);
await entityCollection.create();
// Setup an instance.
instance = {
id: uuid(),
entities: []
};
// Create the instance in the database.
await db.query(aql`
INSERT ${instance} INTO ${instanceCollection}
`);
// Add lots of entities.
for (let i = 0; i < scale; i++) {
// Setup an entity.
const entity = {
id: uuid()
};
// Update the instance.
instance.entities.push(entity);
// Insert the entity in the database.
await db.query(aql`
INSERT ${entity} INTO ${entityCollection}
`);
// Update the instance in the database.
await db.query(aql`
FOR i IN ${instanceCollection}
FILTER i.id == ${instance.id}
UPDATE i WITH ${instance} IN ${instanceCollection} OPTIONS { mergeObjects: false }
`);
}
The problem now is that this becomes extremely slow the more entities I add. It basically has exponential growth, although I would have expected linear growth:
Running benchmark 'add and update'
100 Entities: 348.80ms [+0.00%]
1000 Entities: 3113.55ms [-10.74%]
10000 Entities: 90180.18ms [+158.54%]
Adding an index has an effect, but does not change anything on the overall problem:
Running benchmark 'add and update with index'
100 Entities: 194.30ms [+0.00%]
1000 Entities: 2090.15ms [+7.57%]
10000 Entities: 89673.52ms [+361.52%]
The problem can be tracked down to the UPDATE
statement. If you leave it out and only use the database's INSERT
statement, things scale linearly. So, something seems to be wrong with the update itself. However, I don't understand where the problem is.
This is what I would like to understand: Why does the UPDATE
statement get dramatically slower over time? Am I using it wrong? Is this a known problem in ArangoDB? …?
What I am not interested in is discussing this approach: Please take is as given. Let's focus on the performance of the UPDATE
statement. Any ideas?
As asked for in the comments, here some information on the system setup:
Have you tried explaining or profiling the query?
Arango's explan plan descriptions are excellent. You can access explain
using the built-in Aardvark web admin interface, or using db._explain(query)
. Here's what yours looks like:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
5 CalculationNode 1 - LET #5 = { "_key" : "123", "_id" : "collection/123", "_rev" : "_aQcjewq---", ...instance } /* json expression */ /* const assignment */
2 EnumerateCollectionNode 2 - FOR i IN collection /* full collection scan, projections: `_key`, `id` */ FILTER (i.`id` == "1") /* early pruning */
6 UpdateNode 0 - UPDATE i WITH #5 IN pickups
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
6 primary primary pickups true false 100.00 % [ `_key` ] i
The key part in the plan is - FOR i IN collection /*
full collection scan
Full collection scan will be ...slow. It should grow linearly with the size of your collection. So with your for
loop of scale
iterations, this definitely means exponential growth with the size of the collection.
Indexing the id
should help but I think it depends on how you created the index.
Using _key
instead of index changes the plan to show primary
- FOR i IN pickups /* primary index scan, index only, projections: `_key` */
This should be constant-time, so with your for
loop of scale
iterations, this should mean linear time.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.