简体繁体 English

什么样的计算机科学概念可以处理多个条件的有效评估？

[英]What computer science concept deals with combining multiple conditions for efficient evaluation?

原文 2016-07-16 23:02:44 6 2 tree/ conditional-statements/ computer-science/ boolean-logic/ evaluation

I have a real world problem i'm trying to solve. 我有一个现实世界的问题正在尝试解决。

There's an application that processes incoming requests from a network socket. 有一个应用程序可以处理来自网络套接字的传入请求。

The requests consist of multiple attribute value pairs. 请求由多个属性值对组成。

I want the administrator to be able to filter the debug logging generated by processing those requests, by the attribute value pairs in the request. 我希望管理员能够按请求中的属性值对过滤通过处理这些请求而生成的调试日志记录。

The administrator would enter multiple conditions, with each condition mapping to a different output stream eg: 管理员将输入多个条件，每个条件都映射到不同的输出流，例如：

if ((&user == 'bar') && (&host == 'foo') && (&ip = 192.168.0.1)) -> write debug to fd 9 if（（＆user =='bar'）&&（＆host =='foo'）&&（＆ip = 192.168.0.1））->将调试写入fd 9
if ((&host == 'baz') && (&user == 'bar')) -> write debug to fd 10 if（（＆host =='baz'）&&（＆user =='bar'））->将调试写入fd 10

Efficient evaluation of the above conditions requires that user only need be evaluated once. 对上述条件的有效评估要求用户只需要评估一次。 eg if (&user != 'bar') then we can stop processing. 例如，如果（＆user！='bar'），那么我们可以停止处理。

It's obvious some sort of tree structure is needed... 显然需要某种树形结构...

I should also mention, that in the scenario i'm describing, there are no side effects from condition evaluation (you cannot perform assignment). 我还应该提到，在我描述的场景中，条件评估没有副作用（您无法执行分配）。 So operands to most of the logical operators can be re-ordered without issue. 因此，可以对大多数逻辑运算符的操作数进行重新排序而不会出现问题。

What is the computer science concept that deals with this problem? 解决这个问题的计算机科学概念是什么？ It has the smell of something NP complete. 它闻起来有点NP的味道。

Update: Follow up question. 更新：跟进问题。 Are there any C libraries or expression languages such as BPF that could help solve the real world problem, or provide a generic implementation of the computer science concept? 是否有任何C库或BPF之类的表达语言可以帮助解决现实问题，或提供计算机科学概念的通用实现？

2 个解决方案

I don't know whether this qualifies as "computer science concept", but you could tackle your issue with techniques from data flow planing / synthesis / scheduling. 我不知道这是否符合“计算机科学概念”，但是您可以使用数据流规划/综合/调度中的技术来解决您的问题。 This is especially helpful when your operations have different costs associated (a pattern matching of a string is probably way more expensive than a bit-exact comparison of a byte). 当您的操作具有不同的关联成本时，这特别有用（字符串的模式匹配可能比字节的位精确比较贵得多）。

Basically every "atomic" condition (like user == 'bar' ) would become a node in what's (I think) called a sequence graph. 基本上，每个“原子”条件（例如user == 'bar' ）都将成为序列图的一个节点（我认为）。 Compound conditions ( && and probably || and ! ) then become further nodes in this (directed) graph, with an edge from their operands (nodes) to them. 然后，复合条件（ && ，可能是||和! ）成为此（有向）图中的其他节点，并且从其操作数（节点）到它们的边缘。

Nodes can be given "durations", eg a atomic string comparison takes 20 time units, whereas an && of multiple (already evaluated) conditions only takes 1 time unit. 可以给节点指定“持续时间”，例如原子串比较需要20个时间单位，而多个（已经评估）条件的&&仅需要1个时间单位。

You can then utilize different scheduling algorithms on this graph. 然后，您可以在此图中使用不同的调度算法。 Candidates (that I know of) are ASAP (As Soon As Possible), ALAP (As Late As Possible), "List Scheduling" and Force Directed Scheduling . 候选人（据我所知）是ASAP（尽快），ALAP（尽可能晚），“列表计划”和强制定向计划。

These scheduling algorithms basically compile your graph into an ordered list, specifying an ideal (or heuristically good) order in which you should evaluate the operands in order to get a result for all (complete) condition expressions. 这些调度算法基本上将您的图编译成一个有序列表，指定理想（或启发式良好）顺序，您应在其中按顺序评估操作数，以便获得所有（完整）条件表达式的结果。 This is not really what you need. 这不是您真正需要的。

The above scheduling algorithms are designed in order to produce plans when one has limited (hardware) resources (that are capable of executing the operations represented by the nodes) available, but are all concerned evaluating the full expression. 设计上述调度算法是为了在有有限（硬件）资源（能够执行由节点表示的操作）的可用资源，但都涉及评估完整表达式时生成计划。

You'd need to extent this to incorporate the probability that a comparison could return a result that makes calculating further nodes pointless. 您需要扩展此范围，以考虑比较可能返回结果的可能性，从而使进一步的节点计算变得毫无意义。 I don't know how, though, so that's something you'd need to figure out yourself. 我不知道如何，所以这是您需要弄清楚自己的事情。

In order to do this I'd write a program that - given some complex conditions and some (preferably large) test data - finds (stochastic hill climbing, simulated annealing, genetic algorithm) a good schedule such that the mean expected time is good, and then let different (devised from above graph) scheduling algorithms compete against it. 为此，我编写了一个程序-给定一些复杂的条件和一些（最好是较大的）测试数据-找到（随机爬山，模拟退火，遗传算法）一个好的时间表，以使平均预期时间合适，然后让不同的调度算法（从上图设计）与之竞争。

All of the above is about static scheduling, there are also dynamic scheduling algorithms that could make use of the (step by step known) results of the atomic conditions to possibly make better plans. 以上所有都是关于静态调度的，还有一些动态调度算法可以利用原子条件的结果（逐步了解）来制定更好的计划。

In general, this is called "boolean algebra". 通常，这称为“布尔代数”。 Boolean algebra expressions can be simplified using techniques such as Karnaugh maps or, in general, other circuit minimization techniques . 布尔代数表达式可以使用诸如卡诺图（ Karnaugh map）之类的技术或一般而言其他电路最小化技术来简化。 This field goes back to basic digital logic (before digital computers existed) and is very well studied. 该领域可以追溯到基本的数字逻辑（在存在数字计算机之前），并且已经得到了很好的研究。

In specific cases, like searching through linear log files, you are unlikely to speed up your program because any program searching through unindexed log files is likely to be IO-bound. 在特定情况下，例如搜索线性日志文件，您不可能加快程序的速度，因为任何搜索未索引日志文件的程序都可能受到IO限制。 Or in the case of filtering log entries, the cost of evaluating the filter is likely to be minimal compared to the cost of anything else that your application does. 或者，在过滤日志条目的情况下，与应用程序执行其他操作相比，评估过滤器的成本可能很小。