简体繁体中英

ArangoDB - Is indexing better, than having more collections?

原文 2020-06-08 10:28:39 4 1 database/ indexing/ collections/ arangodb

I have 3 types of entity:

Subjects
Topics
Tasks

In each subjects there are topics and tasks . The topics can depend on each other. (Of course, a topic that belongs to sj1 subject , can only be depended on an another topic that also belongs to sj1 subject .)

Between tasks and topics there are connections (also must belong to same subject) that symbolise the fact that to solve a certain task we need to be aware of certain topics .

So a task can require more topics . Also a topic can be required by more tasks . ( N<--->M connection.)

What would be the best solution to store?

solution
- Have 3 collections for each type of entity
- In tasks and topics have an index for a subject identifier attribute.
- and an edge collection for storing connections between topics [N]<-->[M] tasks
solution
- Have 1 collection for the subjects
- For each subject , have 1 topics , and 1 tasks collections. The connection between subjects and tasks/topics can be based on prefix of collection names. (Ie for chemistry subject we have chemistry_tasks and chemistry_topics collections)
- For each subject , have an edge collection for connections between the tasks and topics and an another edge collection for connections among topics (Ie chemistry_topics_tasks_connections and chemistry_topics_connections )
This way if I want to search among topics or tasks of a subject, I don't need to pre-filter them based on the subject identifier index. I'll immediately get the desired collection that contains all of my data. Moreover I don't have overhead of index for each document in tasks and topics . On the other hand, this will result in a mess of collections.

Sidenote: There will be maximum 50 subjects, but the number of tasks and topics are unlimited.

1 answers

In your terms, "awareness" is generated through the "graph", which requires no extra indexing to work at it's best. ArangoDB automatically creates special "_key" and "_from/_to" indexes, which it uses for graph traversal.

But as for indexing, that about all search performance - indexes are added based on the data you want to find. It really comes down to how you want to search:

one collection with multiple entity types or
multiple collections segregated by entity type.

There is not a penalty for having large collections, and a graph can link documents within a single collection - it doesn't need them to be segregated. Also, you can have multiple edge collections and/or multiple document collections. These are some of the concepts that challenge those of us who, like me, come from a traditional RDBMS - "schemaless" or "multi-model" databases kinda turn normalization on its ear.

Personally, I choose to build fairly large collections based on the data source (I import a data from external sources). Each collection contains documents of multiple object/data schema identified by an objType attribute. The benefit here is that you can search all documents in the collection on a single field (or even an index with multiple fields, like title + objType ), very quickly reducing the set of documents to iterate/traverse - this is usually where real performance gains are made.

So... I guess I recommend solution #3 ?

snowflake is better than indexing?

ArangoDB Synchronizing System Collections

ArangoDB Graph Viewer Options: Label using more than one attribute

Are less tables better than more?

How is data compression more effective than indexing for search performance?

Select field having more than 1 associated fields

Create Dynamic collections with ArangoDB using Spring JPA

Is having flag in database better than querying a table every time?

Is it better to save a string/array in a column rather than having multiple rows in a table ? Which has better performance

Which is better enabling indexing on RDBMS or Lucene Indexing

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question snowflake is better than indexing? ArangoDB Synchronizing System Collections ArangoDB Graph Viewer Options: Label using more than one attribute Are less tables better than more? How is data compression more effective than indexing for search performance? Select field having more than 1 associated fields Create Dynamic collections with ArangoDB using Spring JPA Is having flag in database better than querying a table every time? Is it better to save a string/array in a column rather than having multiple rows in a table ? Which has better performance Which is better enabling indexing on RDBMS or Lucene Indexing

Related Tags

ArangoDB - Is indexing better, than having more collections?

Question

1 answers

solution1 1 2020-06-11 22:48:28

solution1
1 2020-06-11 22:48:28