简体繁体 English

有没有一种有效的算法可以检测复杂数据结构中的依赖周期？

[英]Is there an efficient algorithm to detect dependency cycles in complex data structures?

原文 2012-08-19 10:32:37 8 1 algorithm/ language-agnostic/ dependencies/ cycle

I have a smallish, complex database (a few millions of records split over very low thousands of tables). 我有一个小而复杂的数据库（几百万条记录分散在非常低的数千个表中）。 The records can be thought of as business rules. 可以将记录视为业务规则。 There is provision for users to define their own rules, in terms of existing rules (including other user defined rules). 用户可以根据现有规则（包括其他用户定义的规则）定义自己的规则。 These rules are dependent on other rules, sometimes via complex paths. 这些规则有时会通过复杂的路径依赖于其他规则。 The dependencies form an extended network, rather than a hierarchy. 依赖关系形成一个扩展的网络，而不是一个层次结构。

I am looking for an algorithm to determine, in a newly defined rule (or set of rules) whether the new rule is itself cyclic, or whether it creates cycles when taken together with existing rules. 我正在寻找一种算法，用于在新定义的规则（或规则集）中确定新规则本身是否是循环的，或者与现有规则一起使用时是否会创建循环。

I need an algorithm that is efficient in the following circumstances: 我需要一种在以下情况下有效的算法：

The result of the algorithm needs only to be a boolean - true if there is a cycle, false otherwise. 算法的结果只需是布尔值-如果存在循环，则为true，否则为false。
The existing database can be assumed to be cycle free. 现有数据库可以假定为无周期。
Processing can stop as soon as a cycle is found. 找到一个周期后，处理可以停止。 The usual case (95% ??) will be that there is no cycle. 通常情况下（95％??）将没有循环。 Unfortunately, this is precisely the case where (I think) processing will have to complete all possible paths for the proposed new rule, in order to determine there is no cycle. 不幸的是，正是这种情况（我认为），处理过程必须完成提议的新规则的所有可能路径，以确定没有周期。
This algorithm is to be used to validate new user defined rules, as they are entered into the database. 当新的用户定义规则输入数据库时，该算法将用于验证它们。 It needs to be as quick as possible for the usual case - I don't want this validation to become a bottleneck in the creation process. 在通常情况下，它需要尽可能快-我不希望此验证成为创建过程的瓶颈。
Obtaining data is comparatively expensive - usually involving one or more queries, some of which are quite complex. 获取数据相对昂贵-通常涉及一个或多个查询，其中一些查询非常复杂。 The newly defined rule set can be constrained so as to be completely available in memory. 可以对新定义的规则集进行约束，以使其在内存中完全可用。 If there are any other constraints that can be imposed on the input of new rules, that will aid the efficiency of this checking, I am not aware of what they may be. 如果在输入新规则时可以施加其他限制，这将有助于提高检查效率，我不知道它们可能是什么。

EDIT 编辑

I am accepting Nick's answer, with one modification. 我接受尼克的回答，但有一个修改。 Storing the dependencies is a very easy modification to the database. 存储依赖项是对数据库的非常简单的修改。 I am only going to store the direct dependencies rather than all dependencies whether direct or indirect. 我将只存储直接依赖关系，而不是存储所有直接或间接依赖关系。 I can view the two sets of dependency C,D,F,G and X,Y,Z (in Nick's answer) as tree structures, and use one of the various techniques for deriving hierarchical structures from a single level dependency table. 我可以将两组相关性C，D，F，G和X，Y，Z（由Nick回答）视为树结构，并使用各种技术之一从一个级别的相关性表中导出层次结构。 I think the cost of this will be acceptable in this context. 我认为在这种情况下，这样做的代价是可以接受的。

EDIT 编辑

1 个解决方案

I hope I understood your problem correctly: 希望我能正确理解您的问题：

Lets assume you add rule A to the database, then you also add dependency information like A depends on C,D,F,G and X,Y,Z depend on A . 假设您将规则A添加到数据库，然后还添加依赖项信息，例如A depends on C,D,F,G以及X,Y,Z depend on A

I would assume there is no way of detecting a cycle at insertion time without really looking at the whole structure, which you say is disallowed. 我认为如果没有真正查看整个结构，就无法在插入时检测到周期，这是不允许的。

So my idea would be to have everything precomputed and stored, ie for each rule R store all other rules it depends on (not only directly, but also indirectly). 因此，我的想法是预先计算并存储所有内容，即对于每个规则R，存储它依赖的所有其他规则（不仅直接，而且间接）。 Now when you insert rule A simply get all dependencies from C, D, F, G and see if they include any of X,Y,Z or A if they don't there is no cycle and you can safely add A to your ruleset and store all the dependencies from C, D, F, G plus C, D, F, G themselves as A's dependecies. 现在，当您插入规则A时，只需从C, D, F, G获取所有依赖项C, D, F, G然后查看它们是否包含X,Y,Z or A如果它们没有循环，则可以安全地将A添加到规则集中并将C, D, F, G以及C, D, F, G本身的所有依赖项存储为A的依赖项。