简体   繁体   中英

OR predicate optimization

Suppose I have an entity with 3 attributes: A1, A2, A3 such that:

  1. A1 can only have values: 1, 2, 3
  2. A2 can only have values: 10, 20, 30, 40, 50
  3. A3 can only have values: 100, 200

And a number of rules, for example:

R1: (A1 in (1, 2)) AND (A2 in (20, 40, 50)) AND (A3 IN (100))
R2: (A1 in (1, 3)) AND (A2 in (10, 30)) AND (A3 in (200))
R3: (A1 in (1, 2)) AND (A2 in (10)) AND (A3 in (100))

Then there is a predicate: R = R1 or R2 or R3 , which I would like to minimize. The thing is that A1=1 covers all possible variations of A2 and A3 , so we can bring it into a separate clause: R = (A1=1) or (the rest)

I've tried boolean minimization methods by declaring variables as a=(A1=1), b=(A1=2), ..., k=(A3=200) , however it does not seem to work, because:

  1. boolean optimizer is not aware of all the values of attribute A
  2. boolean variables are not independent When trying to address these issues, the expression is becoming too complex and neither QMC, not Espresso is not able to minimize it in the desired way.

I've also tried to store each-to-each mappings and in case one of them have all the values of another one, use it as an aggregation anchor, then remove it and repeat, but it takes eternity and quite a lot of RAM.

Maybe we can represent attribute values as a set and address it from the set theory point of view.

Have you ever faced a problem this? Are you aware of better ways to solve it? (heuristics are ok as well)

A method of optimizing the expression for the evaluation could be to split the rules repeatedly on the attribute with the fewest values. After this expansion you could collect the values again for those who have the same ones on the last clause.

  1. Make 2 groups, one for the rules that accept A3 = 100 and one for the rules that accept A3 = 200. A rule can end up in both groups. Then modify the rule in the group so that it only accepts the value for the group and not the other one.

  2. Group those groups again on the values of A1 using the same logic.

You would end up with an expanded expression like this:

A3 = 100 AND (
    (A1 = 1 AND A2 IN (10, 20, 40, 50)) OR
    (A1 = 2 AND A2 IN (10, 20, 40, 50)))
OR A3 = 200 AND (
    (A1 = 1 AND A2 IN (10, 30)) OR
    (A1 = 3 AND A2 IN (10, 30)))

Basically we are constructing a tree with the values for A3 at depth 1 and the values for A1 at depth 2 and the values for A2 at depth 3. If there is a path from root to leaf using the attribute values then the rule is fullfilled otherwise it isnt.

After that you can merge all nodes with the same subtree and the same parent. For this you can compare the leaves of all nodes with the same parent and if they match you can merge the nodes. After that you go one level up and compare the nodes with the same parent and so on.

For your example you would end up with this expression:

A3 = 100 AND A1 IN (1, 2) AND A2 IN (10, 20, 40, 50) OR
A3 = 200 AND A1 IN (1, 3) AND A2 IN (10, 30)

This process is pretty simple and could also shorten the expression, not only optimize it for evaluation. It might not be perfect, but it could be a way to start.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM