简体   繁体   English

使用ArangoDb / OrientDb进行分层数据模型和文档搜索:是正确的解决方案吗?

[英]using ArangoDb / OrientDb for hierarchical data model and document search: is the right solution?

i'm developing a document management software and i'm evaluation a noSql database for storage and search data. 我正在开发一个文档管理软件,我正在评估存储和搜索数据的noSql数据库。

Summary the software act like a file system when items are organized in directory and subdirectory. 总结当项目在目录和子目录中组织时,软件就像文件系统一样。

Each item of the tree can have n properties used for filter and sort. 树的每个项目都可以有n个属性用于过滤和排序。

Items can also be eventually connected each other with some kind of other relations (other than parent-child). 物品也可以最终与某种其他关系(父母除外)相互联系。

Items count could be relative large (some millions) and the killer features of the application has to be costant performance in retrieve data (with filters and sort by properties) indipendently from database grow. 项目数量可能相对较大(数百万),并且应用程序的杀手级功能必须在检索数据(使用过滤器和按属性排序)方面具有恒定的性能,而不依赖于数据库增长。

I need 3 key feature: 我需要3个关键功能:

  • Get direct childs of a folder. 获取文件夹的直接子项。 result must be pageable, sortable and filterable for each document property 结果必须是每个文档属性的可分页,可排序和可过滤的

  • Get all childs of a folder (all items of the subtree). 获取文件夹的所有子项(子树的所有项)。 result must be pageable, sortable and filterable for each document property 结果必须是每个文档属性的可分页,可排序和可过滤的

  • Get all parents of a folder 获取文件夹的所有父母

I'm a newbie in noSql and actually i use a rdbms (Sql Server) but i hit with performance issue and all limits caused by a fixed schema for document properties. 我是noSql中的新手,实际上我使用的是rdbms(Sql Server),但我遇到了性能问题以及由文档属性的固定架构引起的所有限制。 I'm evaluating OrangoDb or OrientDb because i think that it's feature (document oriented and graph oriented) could be the best solution for my design needs. 我正在评估OrangoDb或OrientDb,因为我认为它的功能(面向文档和面向图形)可能是我设计需求的最佳解决方案。

Can you help me, giving me a suggestion for design the database and the query for this 3 task? 你能帮助我,给我一个关于设计数据库和查询这个3任务的建议吗?

Nb. 铌。 i need that the result of the query return a dataset with a column for each property: 我需要查询的结果返回一个数据集,其中包含每个属性的列:

Es. doc1: p1: v1, p2: v2
    doc2: p1: v1, p3: v3

result:
    name | p1 | p2 | p3
    doc1   v1   v2   null
    doc2   v1   null v3

I'm thinking design an item as: 我在想设计一个项目:

{ 
  "_id": "_myItemId",
  "name`enter code here`" : "Item1",
  "itemType": "root / folder / file"   
  "parentItemId": "",
  "properties" : [ 
    { name: "Property1", formatType: 0, formatMask: "", value: "Value1" }, 
    { name: "Property2", formatType: 0, formatMask: "", value: "Value2" }, 
    { name: "Property3", formatType: 0, formatMask: "", value: "Value3" }  
  ] 
}

do you have any suggestions for a design able to solve the 3 key features described above? 您对能够解决上述3个关键特性的设计有什么建议吗?

Thanks 谢谢

The approach with graph databases it's very different from other kind of dbms. 图表数据库的方法与其他类型的dbms非常不同。 You can "connect" your entities (Vertex) using Edges, a direct link between one entity and another one. 您可以使用边缘“连接”您的实体(顶点),边缘是一个实体与另一个实体之间的直接链接。 So, first of all, you don't need to store eg. 所以,首先,你不需要存储例如。 the "parentItemId" for each object like you would do in a Sql or document database, but instead you will have the two / three or many entities with only their specific data; 每个对象的“parentItemId”就像在Sql或文档数据库中那样,但是你将只有两个/三个或多个实体只有它们的特定数据; relationships will be handled by the Edges you create between them. 关系将由您在它们之间创建的边缘处理。

OrientdDb has a very good documentation and some examples to start understanding concepts. OrientdDb有一个非常好的文档和一些开始理解概念的例子。 EG: the tutorial page: http://orientdb.com/docs/2.1/Tutorial-Working-with-graphs.html explains graphs concepts and has some good examples. EG:教程页面: http//orientdb.com/docs/2.1/Tutorial-Working-with-graphs.html解释了图形概念并有一些很好的例子。

In your specific case, you could have two entity types (Vertex), Folder and Document, and an Edge that you call eg. 在您的特定情况下,您可以有两种实体类型(顶点),文件夹和文档,以及您调用的边缘,例如。 "ChildOf" (from Document to Folder) or "Contains" (from Folder to Documents). “ChildOf”(从文档到文件夹)或“包含”(从文件夹到文档)。 Then there are many queries you can do to find relationships, even specifying the level of nesting etc. 然后,您可以执行许多查询来查找关系,甚至可以指定嵌套级别等。

You can create a working schema in the following steps: 您可以按以下步骤创建工作模式:

1 Create class and edge tpyes: 1创建类和边缘tpyes:

CREATE CLASS Document Extends V
CREATE CLASS Folder Extends V
CREATE CLASS ChildOf Extends E

2 Insert some documents 2插入一些文件

INSERT INTO Document SET Title = 'Document 1', Name = '..'
INSERT INTO Document SET Title = 'Document 2', Name = '..'
INSERT INTO Document SET Title = 'Document 3', Name = '..'

3 Insert Folders 3插入文件夹

INSERT INTO Folder SET Name = 'Folder 1'
INSERT INTO Folder SET Name = 'Folder 2'

4 Create Edges (relationships) between Vertex 4在Vertex之间创建边(关系)

CREATE EDGE ChildOf FROM #<specify document rid here> TO #<specify folder rid here>
...

You can also create a folder as a children of another folder, by setting the same "ChildOf" edge between two folders: 您还可以通过在两个文件夹之间设置相同的“ChildOf”边缘,将文件夹创建为另一个文件夹的子文件夹:

 CREATE EDGE ChildOf FROM #<specify children folder rid here> TO #<specify parent folder rid here>
...

5 Query your graph. 5查询图表。 Get direct childs of a folder, using expand() and in() operators: 使用expand()和in()运算符获取文件夹的直接子节点:

Select expand(in('ChildOf')) From #<folder rid> Where ...

Get all childs of a folder, using Traverse query to traverse all childrens from a starting folder: 获取文件夹的所有子项,使用Traverse查询从起始文件夹遍历所有子项:

SELECT FROM (
     TRAVERSE out('ChildOf') FROM #<folder rid> WHILE $depth <= 3 //you can specify the maximum level of nesting
) where $depth > 0 //exclude the first element (the starting folder itself)

Get all parents of a folder, using traverse and "In" graph operator: 使用遍历和“In”图形运算符获取文件夹的所有父项:

SELECT FROM (
         TRAVERSE in('ChildOf') FROM #<folder rid> 
    ) where $depth > 0 //exclude the first element (the starting folder itself)
//here you could filter only the "Folders"
where @class ='Folder'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM