简体   繁体   English

用于树结构数据的图形数据库或关系数据库

[英]Graph database or relational database for tree structure data

I have company holding data with hierarchical structure by year.我有公司按年持有具有层次结构的数据。 For example company A holds 50% of B, B holds 50% of C and D holds 50% of C.例如,A 公司持有 B 的 50%,B 持有 C 的 50%,D 持有 C 的 50%。 Each firm has their properties such as industry.每个公司都有自己的属性,例如行业。

There are few write operations and mostly read.写入操作很少,主要是读取。 Specifically, starting from a set of nodes (root), extract the family tree by tracing down with certain percentage share threshold.具体来说,从一组节点(根)开始,以一定的百分比份额阈值向下追踪,提取家谱。 There are several metrics of interest in the family tree.家谱中有几个感兴趣的指标。

For each node:对于每个节点:

  1. the depth from the root从根的深度
  2. the product of share layer by layer from the root, eg A holds 0.5*0.5 = 25% of C.从根开始逐层共享的乘积,例如A持有C的0.5*0.5 = 25%。

For each level:对于每个级别:

  1. the distribution of share from each root每个根的份额分配
  2. the distribution of industry产业分布

Note that there could be multiple roots for each node and we are interested in all.请注意,每个节点可能有多个根,我们对所有根都感兴趣。

For now, the data is stored in a relational database and the task described above is done through joining.目前,数据存储在关系数据库中,上述任务通过连接完成。 Would a graph database such as neo4j be more suitable for the data and this task?像 neo4j 这样的图形数据库会更适合数据和这项任务吗? The crux of the problem is to have a proper index so that joining is not necessary for each time.问题的关键是要有一个合适的索引,这样就不需要每次都加入。 Any suggestion and pointer would be greatly appreciated.任何建议和指针将不胜感激。

Just about any graph database can model the information you are describing.几乎任何图形数据库都可以 model 您描述的信息。 How you go about constructing the queries to get what you want will be different in each product.您如何构建查询以获得您想要的结果在每个产品中都会有所不同。

In InfiniteGraph we can model the information using the following schema:在 InfiniteGraph 中,我们可以使用以下模式 model 信息:

UPDATE SCHEMA {
    CREATE CLASS Company {
        name        : String,
        industry    : String, 
        
        owns        : LIST {
                        element: Reference {
                            edgeClass       : Owns,
                            edgeAttribute   : owns
                        },
                        CollectionTypeName  : SegmentedArray
                    },
        ownedBy     : LIST {
                        element: Reference {
                            edgeClass       : Owns,
                            edgeAttribute   : ownedBy
                        },
                        CollectionTypeName  : SegmentedArray
                    }
        
    }
    
    CREATE CLASS Owns
    {
        percentage  : Real { Storage: B32 },
        owns        : Reference {referenced: Company, inverse: ownedBy },
        ownedBy     : Reference {referenced: Company,  inverse: owns }
    }
};

Then we can load the data you referred to in your question:然后我们可以加载您在问题中提到的数据:

LET coA = CREATE Company { name: "A", industry: "Manufacturing" };
LET coB = CREATE Company { name: "B", industry: "Manufacturing" };
LET coC = CREATE Company { name: "C", industry: "Retail" };
LET coD = CREATE Company { name: "D", industry: "Construction" };

CREATE Owns { owns: $coB, ownedBy: $coA, percentage: 50.00 };
CREATE Owns { owns: $coC, ownedBy: $coB, percentage: 50.00 };
CREATE Owns { owns: $coC, ownedBy: $coD, percentage: 50.00 };

Finally, we can define a weight calculator operator that effectively multiplies the edge weights along a path together.最后,我们可以定义一个权重计算器算子,它可以有效地将沿路径的边权重相乘。 Here we represent the weight of each edge as 1/percentage and then at the end we flip the sum over again and this gives us the value you're looking for.在这里,我们将每条边的权重表示为 1/百分比,然后在最后我们再次翻转总和,这给了我们您正在寻找的值。

CREATE WEIGHT CALCULATOR wcOwnership {
    minimum:    0,
    default:    0, 
    edges: {
        (:Company)-[ow:Owns]->(:Company): 1/ow.percentage
    }
};

The "edges" section defines the edge patterns to match on and the computation to be performed to compute the edge weight for that edge. “edges”部分定义了要匹配的边缘模式以及为计算该边缘的边缘权重而执行的计算。 In InfiniteGraph, the edge weight does not have to be an attribute;在 InfiniteGraph 中,边权重不一定是属性; it can be a simple attribute or the result of complex computation based on the contents of one or many objects.它可以是一个简单的属性,也可以是基于一个或多个对象的内容进行复杂计算的结果。

On the given data, we can use the weight calculator to query from the target company (C) up the hierarchy and for each root discovered, we can display the target (C), the percentage of ownership, the length of the path, and the name of the root company.在给定的数据上,我们可以使用权重计算器从目标公司 (C) 向上查询层次结构,对于发现的每个根,我们可以显示目标 (C)、所有权百分比、路径长度和根公司的名称。 This particular query only goes 1 to 10 degrees ([*1..10]) but this number can be expanded as necessary.这个特定的查询只进行 1 到 10 度 ([*1..10]),但这个数字可以根据需要扩展。

  DO> Match m = max weight 1000.0 wcOwnership 
                    ((cTarget:Company {name == 'C'})-[*1..10]->(cRoot:Company)) 
                     return cTarget.name, 
                            1/Weight(m) as PercentageOwnership, 
                            Length(m), 
                            cRoot.name;

{
  _Projection
  {
    cTarget.name:'C',
    PercentageOwnership:50.0000,
    Length(m):1,
    cRoot.name:'B'
  },
  _Projection
  {
    cTarget.name:'C',
    PercentageOwnership:50.0000,
    Length(m):1,
    cRoot.name:'D'
  },
  _Projection
  {
    cTarget.name:'C',
    PercentageOwnership:25.0000,
    Length(m):2,
    cRoot.name:'A'
  }
}  

This model will capture all of the root nodes per company in question.这个 model 将捕获每个相关公司的所有根节点。

#InfiniteGraph #无限图

Neo4j can be a good fit here. Neo4j 在这里很合适。

Neo4j does use indexes to find starting points in the graph, such as your root nodes. Neo4j 确实使用索引来查找图中的起点,例如您的根节点。 If you're just using it to get the root node, then that's a single index lookup for the entire query.如果您只是使用它来获取根节点,那么这是整个查询的单个索引查找。

From there, traversing the tree is just traversing relationships, that's just pointer hopping node reference -> relationship reference -> node reference etc. No joins involved.从那里开始,遍历树只是遍历关系,这只是指针跳跃节点引用 -> 关系引用 -> 节点引用等。不涉及连接。 Then you're just crunching on the numbers in the nodes per path to get the percentages.然后,您只需计算每条路径的节点中的数字即可获得百分比。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM