I have 3 types of entity:
In each subjects there are topics and tasks . The topics can depend on each other. (Of course, a topic that belongs to sj1 subject , can only be depended on an another topic that also belongs to sj1 subject .)
Between tasks and topics there are connections (also must belong to same subject) that symbolise the fact that to solve a certain task we need to be aware of certain topics .
So a task can require more topics . Also a topic can be required by more tasks . ( N<--->M connection.)
What would be the best solution to store?
solution
solution
This way if I want to search among topics or tasks of a subject, I don't need to pre-filter them based on the subject identifier index. I'll immediately get the desired collection that contains all of my data. Moreover I don't have overhead of index for each document in tasks and topics . On the other hand, this will result in a mess of collections.
Sidenote: There will be maximum 50 subjects, but the number of tasks and topics are unlimited.
In your terms, "awareness" is generated through the "graph", which requires no extra indexing to work at it's best. ArangoDB automatically creates special "_key" and "_from/_to" indexes, which it uses for graph traversal.
But as for indexing, that about all search performance - indexes are added based on the data you want to find. It really comes down to how you want to search:
There is not a penalty for having large collections, and a graph can link documents within a single collection - it doesn't need them to be segregated. Also, you can have multiple edge collections and/or multiple document collections. These are some of the concepts that challenge those of us who, like me, come from a traditional RDBMS - "schemaless" or "multi-model" databases kinda turn normalization on its ear.
Personally, I choose to build fairly large collections based on the data source (I import a data from external sources). Each collection contains documents of multiple object/data schema identified by an objType
attribute. The benefit here is that you can search all documents in the collection on a single field (or even an index with multiple fields, like title
+ objType
), very quickly reducing the set of documents to iterate/traverse - this is usually where real performance gains are made.
So... I guess I recommend solution #3 ?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.