简体   繁体   中英

using ArangoDb / OrientDb for hierarchical data model and document search: is the right solution?

i'm developing a document management software and i'm evaluation a noSql database for storage and search data.

Summary the software act like a file system when items are organized in directory and subdirectory.

Each item of the tree can have n properties used for filter and sort.

Items can also be eventually connected each other with some kind of other relations (other than parent-child).

Items count could be relative large (some millions) and the killer features of the application has to be costant performance in retrieve data (with filters and sort by properties) indipendently from database grow.

I need 3 key feature:

  • Get direct childs of a folder. result must be pageable, sortable and filterable for each document property

  • Get all childs of a folder (all items of the subtree). result must be pageable, sortable and filterable for each document property

  • Get all parents of a folder

I'm a newbie in noSql and actually i use a rdbms (Sql Server) but i hit with performance issue and all limits caused by a fixed schema for document properties. I'm evaluating OrangoDb or OrientDb because i think that it's feature (document oriented and graph oriented) could be the best solution for my design needs.

Can you help me, giving me a suggestion for design the database and the query for this 3 task?

Nb. i need that the result of the query return a dataset with a column for each property:

Es. doc1: p1: v1, p2: v2
    doc2: p1: v1, p3: v3

result:
    name | p1 | p2 | p3
    doc1   v1   v2   null
    doc2   v1   null v3

I'm thinking design an item as:

{ 
  "_id": "_myItemId",
  "name`enter code here`" : "Item1",
  "itemType": "root / folder / file"   
  "parentItemId": "",
  "properties" : [ 
    { name: "Property1", formatType: 0, formatMask: "", value: "Value1" }, 
    { name: "Property2", formatType: 0, formatMask: "", value: "Value2" }, 
    { name: "Property3", formatType: 0, formatMask: "", value: "Value3" }  
  ] 
}

do you have any suggestions for a design able to solve the 3 key features described above?

Thanks

The approach with graph databases it's very different from other kind of dbms. You can "connect" your entities (Vertex) using Edges, a direct link between one entity and another one. So, first of all, you don't need to store eg. the "parentItemId" for each object like you would do in a Sql or document database, but instead you will have the two / three or many entities with only their specific data; relationships will be handled by the Edges you create between them.

OrientdDb has a very good documentation and some examples to start understanding concepts. EG: the tutorial page: http://orientdb.com/docs/2.1/Tutorial-Working-with-graphs.html explains graphs concepts and has some good examples.

In your specific case, you could have two entity types (Vertex), Folder and Document, and an Edge that you call eg. "ChildOf" (from Document to Folder) or "Contains" (from Folder to Documents). Then there are many queries you can do to find relationships, even specifying the level of nesting etc.

You can create a working schema in the following steps:

1 Create class and edge tpyes:

CREATE CLASS Document Extends V
CREATE CLASS Folder Extends V
CREATE CLASS ChildOf Extends E

2 Insert some documents

INSERT INTO Document SET Title = 'Document 1', Name = '..'
INSERT INTO Document SET Title = 'Document 2', Name = '..'
INSERT INTO Document SET Title = 'Document 3', Name = '..'

3 Insert Folders

INSERT INTO Folder SET Name = 'Folder 1'
INSERT INTO Folder SET Name = 'Folder 2'

4 Create Edges (relationships) between Vertex

CREATE EDGE ChildOf FROM #<specify document rid here> TO #<specify folder rid here>
...

You can also create a folder as a children of another folder, by setting the same "ChildOf" edge between two folders:

 CREATE EDGE ChildOf FROM #<specify children folder rid here> TO #<specify parent folder rid here>
...

5 Query your graph. Get direct childs of a folder, using expand() and in() operators:

Select expand(in('ChildOf')) From #<folder rid> Where ...

Get all childs of a folder, using Traverse query to traverse all childrens from a starting folder:

SELECT FROM (
     TRAVERSE out('ChildOf') FROM #<folder rid> WHILE $depth <= 3 //you can specify the maximum level of nesting
) where $depth > 0 //exclude the first element (the starting folder itself)

Get all parents of a folder, using traverse and "In" graph operator:

SELECT FROM (
         TRAVERSE in('ChildOf') FROM #<folder rid> 
    ) where $depth > 0 //exclude the first element (the starting folder itself)
//here you could filter only the "Folders"
where @class ='Folder'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM