简体   繁体   中英

Call Python or Lua from C++ to evaluate an expression, calculating unknown variables only if needed

I have an expression/formula like this

 std::string expr="((A>0) && (B>5 || C > 10))";

I have done some research and it seems that if A,B,C values are known, by embedding Lua or Python in a C++ program, there are eval functions that can substitute A,B and C and return true or false .

But what happens when I don't know all the values? Let's say that A is known and it is -1. If A is -1 then the formula will evaluate to "false" irrespective of the values of B or C.

Can I evaluate a formula without knowing all the variables in advance? For example if A is 10, it makes sense to lookup for the value of B and re-evaluate again. How can we solve these problems? Ideas?

I don't know of any existing available library for handling this.

The usual approach would be to build an expression-tree and evaluate what is possible - similarly to constant folding in compilers: https://en.wikipedia.org/wiki/Constant_folding

One of the important aspects for that is to know the allowed values for variables, and thus the allowed partial evaluations, eg x*0 (and 0*x ) is 0 if x is an integer, or finite floating point number, but cannot be evaluated if x is an IEEE floating number (since it could be Nan or infinity), or if x could be matrix since [1,1]*0 is [0,0] not the scalar 0 .

One way is to parse the expression into a tree and evaluate the tree. Subexpressions for which all variables are known will be fully evaluated. The effect will be to simplify the tree.

In your example, the tree has && at the top with two subtrees, the left one being the tree for A>0 . To evaluate the tree, we evaluate the left subtree, which returns -1, and so we don't need to evaluate the right subtree, because the operator is && . The whole tree evaluates to false .

I don't understand exactly what you want to do or understand but I am OK with ivan_pozdeev about short-circuit evaluation and lazy evaluation .

A boolean expression is evaluate from the left to the right and when the result is known the evaluation stop and ignore what is on the right.

With Python:

E = "(A > 0) and (B > 5 or C > 10)"
A = -1
print(eval(E))

gives

False

But

E = "(A > 0) and (B > 5 or C > 10)"
A = 1
print(eval(E))

gives the error " name 'B' is not defined ".

So from what I understand of your question, you want something like

if (A>0) {
  B = getB();
  C = getC();
  if (B>23 || C==11)
    explode();
}

ie your expression must be split up so that you are only ever working with known values.

You can do it like this:

class LazyValues():

    def __init__(self):
        self._known_values = {}

    def __getitem__(self, var):
        try:
            return self._known_values[var]
        except KeyError:
            print("Evaluating %s..." % var)
            return self._known_values.setdefault(var, eval(var))


def lazy_eval(expr, lazy_vars):
    for var in lazy_vars:
        expr  = expr.replace(var, "lazy_values['%s']" % var)
        # will look like ((lazy_value['A']>0) && (lazy_value['B']>5 || lazy_value['C'] > 10))

    lazy_values = LazyValues()
    return eval(expr)


lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

# Evaluating A...
# ....
# NameError: name 'A' is not defined

A = -1
lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

#Evaluating A...
#False

A = 5
B = 6
lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

# Evaluating A...
# Evaluating B...
# True

More details later...

While this is a very crude implementation of your solution, but it suits your situation perfectly although using a lot of if else and Exception Handling.

def main_func():
    def checker(a, b=None, c=None):
        if a is None:
            del a
        if b is None:
            del b
        if c is None:
            del c
        d = eval('a>0 and (b>5 or c>10)')
        return d
    return checker

def doer(a=None, b=None, c=None):
    try:
        return main_func()(a,b,c)
    except NameError as e:
        if "'a' is not" in str(e):
            return 'a'
        elif "'b' is not" in str(e):
            return 'b'
        elif "'c' is not" in str(e):
            return 'c'

def check_ret(ret):
    return type(ret) == bool

def actual_evaluator():
    getter = {
        "a": get_a,
        "b": get_b,
        "c": get_c
    }
    args = []
    while True:
        ret = doer(*tuple(args))
        if not check_ret(ret):
            val = getter[ret]()
            args.append(val)
        else:
            return ret

if __name__ == '__main__':
        print actual_evaluator()

Now explaining my code, main_func returns another function which is used to evaluate the given expression in a string. While here the string has been hardcoded, you can always pass it as a parameter to the function and replace the string inside eval with the parameter.

In doer , the function returned by main_func is invoked and if a NameError is thrown, which happens in case of the previous conditions being false and new values to be calculated then it returns the specific variable which needs to be calculated. All of that is checked in the actual_evaluator where the values of the variables are fetched via some function get_variable_name which you can define in your getter dict. In my code, I'd used random numbers to check the validity, but like you said you'd have to evaluate the various variables by other means so you can call the respective functions.

It seems to me that the answer is yes, yes you can try to evaluate the expression with missing information. You will need to define what happens though when a symbol lookup fails.

In you case, you will need a boolean expression evaluator and a symbol table so the evaluator can lookup the symbols to execute the expression.

If you succeed in looking up all the symbols, the result would be true or false. If you fail to look up a symbol, then handle that case, perhaps returning None, nullptr, or raising / throwing an exception.

I believe you can embed the python interpreter in your c++ program and call a function to evaluate the expression, more importantly, you can give it the dict to use as the symbol table. If the call returns a result, it was able to find enough symbols or shortcut to a result, otherwise it will raise an exception which your c++ code can detect.

You can prototype the function in python the evaluate if the approach works they way you want, then embed.

Or you can do it all in c++, with a grammar, lexer, parser, and elevator.

Due to the short-circuit behaviour, Python can evaluate an expression even without all the contained values defined if it's possible. If not, it raises an exception:

In [1]: a= False

In [3]: a and b
Out[3]: False

In [4]: a or b
NameError: name 'b' is not defined

But the expression is evaluated left to right:

In [5]: b and a
NameError: name 'b' is not defined

I have done a "roll-my-own" approach for this in the past. It's not that difficult for simple things; you simply create your own objects that implement magic math methods and that keep track of other objects.

If you need something more full-featured, the sympy project is designed to do symbolic math...

I'd take a look at sympy or other computer algebra systems. I believe that algebraic simplification of the pxeression plus short circuit evaluation will allow you to evaluate all the cases where it is possible to get a result. There are some cases where you need to know the value of some variable. For example if you have a simple expression like a == b, you are going to make no progress without knowing the value of a and b. However something like (a >= 0) ||(a <= 0), algebraic simplification will result in true assuming that a is not NAN or some other value that is not equal to itself.

It sounds like you have two challenges:

  1. It is expensive to calculate some variable values, so you want to avoid calculating values that aren't needed to evaluate the expression; and
  2. Your expression exists as a string, composed at runtime, so you can't use C++'s built-in short-circuiting logic.

This means you need some way to evaluate an expression at runtime, and you would like to take advantage of short-circuit logic if possible. Python could be a good choice for this, as shown in the example below.

There is a short Python script ( evaluate.py ), which defines an evaluate() function which can be called from your C or C++ program. The evaluate() function will attempt to evaluate the expression you give it (translating "&&" and "||" to "and" and "or" if needed). If it requires a variable which hasn't been defined yet, it will retrieve a value for that variable by calling a get_var_value() function defined in the C/C++ program (and then cache the value for later use).

This approach will use normal short-circuit behavior, so it will only request the variable values needed to finish evaluating the expression. Note that this won't rearrange the expression to choose the minimal set of variables needed to evaluate it; it just uses the standard short-circuiting behavior.

UPDATE: I've added an example at the end that defines the Python script using a multiline string literal in the .cpp file. This could be useful if you don't want to install a separate evaluate.py file along with your executable. It also simplifies the Python initialization a little bit.

The C/Python interaction in the scripts below is based on code in https://docs.python.org/2/extending/embedding.html and https://docs.python.org/2/c-api/arg.html .

Here are the files:

evaluate.py (Python script)

# load embedded_methods module defined by the parent C program
from embedded_methods import get_var_value

# define a custom dictionary class that calls get_var_value(key) for any missing keys.
class var_dict(dict):
    def __missing__(self, var):
        self[var] = val = get_var_value(var)
        return val

# define a function which can be called by the parent C program
def evaluate(expr):
    # Create a dictionary to use as a namespace for the evaluation (this version 
    # will automatically request missing variables).
    # Move this line up to the module level to retain values between calls.
    namespace = var_dict()

    # convert C-style Boolean operators to Python-style
    py_expr = expr.replace("||", " or ").replace("&&", " and ").replace("  ", " ")

    print('evaluating expression "{}" as "{}"'.format(expr, py_expr))

    # evaluate the expression, retrieving variable values as needed
    return eval(py_expr, namespace)

evaluate.c (your main program; could also be evaluate.cpp, compiled with g++)

// on Mac, compile with gcc -o evaluate evaluate.c -framework Python
#include <Python/Python.h>  // Mac
// #include <Python.h> // non-Mac?

// retain values of argc and argv for equation evaluation
int argc;
char **argv;

/* 
   Calculate the value of a named variable; this is called from the Python 
   script to obtain any values needed to evaluate the expression. 
*/
static PyObject* c_get_var_value(PyObject *self, PyObject *args)
{
    int var_num;
    char *var_name;
    char err_string[100];
    long var_value;
    if(!PyArg_ParseTuple(args, "s:get_var_value", &var_name)) {
        PyErr_SetString(PyExc_ValueError, "Invalid arguments passed to get_var_value()");
        return NULL;
    }
    // change the code below to define your variable values
    // This version just assumes A, B, C are given by argv[2], argv[3], argv[4], etc.
    printf("looking up value of %s: ", var_name);
    var_num = var_name[0]-'A';
    if (strlen(var_name) != 1 || var_num < 0 || var_num >= argc-2) {
        printf("%s\n", "unknown");
        snprintf(
            err_string, sizeof(err_string), 
            "Value requested for unknown variable \"%s\"", var_name
        );
        PyErr_SetString(PyExc_ValueError, err_string);
        return NULL;  // will raise exception in Python
    } else {
        var_value = atoi(argv[2+var_num]);
        printf("%ld\n", var_value);
        return Py_BuildValue("l", var_value);
    }
}

// list of methods to be added to the "embedded_methods" module
static PyMethodDef c_methods[] = {
    {"get_var_value", c_get_var_value, METH_VARARGS, // could use METH_O
     "Retrieve the value for the specified variable."},
    {NULL, NULL, 0, NULL} // sentinel for end of list
};

int main(int ac, char *av[])
{
    PyObject *p_module, *p_evaluate, *p_args, *p_result;
    long result;
    const char* expr;

    // cache and evaluate arguments
    argc = ac;
    argv = av;
    if (argc < 2) {
        fprintf(
            stderr, 
            "Usage: %s \"expr\" A B C ...\n"
            "e.g.,  %s \"((A>0) && (B>5 || C > 10))\" 10 9 -1\n", 
            argv[0], argv[0]
        );
        return 1;
    }
    expr = argv[1];

    // initialize Python
    Py_SetProgramName(argv[0]);
    Py_Initialize();
    // Set system path to include the directory where this executable is stored
    // (to find evaluate.py later)
    PySys_SetArgv(argc, argv);

    // attach custom module with get_var_value() function
    Py_InitModule("embedded_methods", c_methods);

    // Load evaluate.py
    p_module = PyImport_ImportModule("evaluate");
    if (PyErr_Occurred()) { PyErr_Print(); }
    if (p_module == NULL) {
        fprintf(stderr, "unable to load evaluate.py\n");
        return 1;
    }

    // get a reference to the evaluate() function
    p_evaluate = PyObject_GetAttrString(p_module, "evaluate");
    if (!(p_evaluate && PyCallable_Check(p_evaluate))) {
        fprintf(stderr, "Cannot retrieve evaluate() function from evaluate.py module\n");
        return 1;
    }

     /*
        Call the Python evaluate() function with the expression to be evaluated.
        The evaluate() function will call c_get_var_value() to obtain any
        variable values needed to evaluate the expression. It will use 
        caching and normal logical short-circuiting to reduce the number 
        of requests.
     */
    p_args = Py_BuildValue("(s)", expr);
    p_result = PyObject_CallObject(p_evaluate, p_args);
    Py_DECREF(p_args);
    if (PyErr_Occurred()) {
        PyErr_Print();
        return 1;
    }
    result = PyInt_AsLong(p_result);
    Py_DECREF(p_result);

    printf("result was %ld\n", result);

    Py_DECREF(p_evaluate);
    Py_DECREF(p_module);
    return 0;
}

Results:

$ evaluate "((A>0) && (B>5 || C > 10))" -1 9 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: -1
result was 0

$ evaluate "((A>0) && (B>5 || C > 10))" 10 9 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: 10
looking up value of B: 9
result was 1

$ evaluate "((A>0) && (B>5 || C > 10))" 10 3 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: 10
looking up value of B: 3
looking up value of C: -1
result was 0

As an alternative, you can combine all this code into a single .cpp file, as shown below. This uses the multi-line string literal capability in C++11.

Self-contained evaluate.cpp

// on Mac, compile with g++ evaluate.cpp -o evaluate -std=c++11 -framework Python
#include <Python/Python.h>  // Mac
//#include <Python.h> // non-Mac?

/* 
   Python script to be run in embedded interpreter.
   This defines an evaluate(expr) function which will interpret an expression
   and return the result. If any variable values are needed, it will call the
   get_var_values(var) function defined in the parent C++ program
*/
const char* py_script = R"(
# load embedded_methods module defined by the parent C program
from embedded_methods import get_var_value

# define a custom dictionary class that calls get_var_value(key) for any missing keys.
class var_dict(dict):
    def __missing__(self, var):
        self[var] = val = get_var_value(var)
        return val

# define a function which can be called by the parent C program
def evaluate(expr):
    # Create a dictionary to use as a namespace for the evaluation (this version 
    # will automatically request missing variables).
    # Move this line up to the module level to retain values between calls.
    namespace = var_dict()

    # convert C-style Boolean operators to Python-style
    py_expr = expr.replace("||", " or ").replace("&&", " and ").replace("  ", " ")

    print('evaluating expression "{}" as "{}"'.format(expr, py_expr))

    # evaluate the expression, retrieving variable values as needed
    return eval(py_expr, namespace)
)";

// retain values of argc and argv for equation evaluation
int argc;
char **argv;

/* 
   Calculate the value of a named variable; this is called from the Python 
   script to obtain any values needed to evaluate the expression. 
*/
static PyObject* c_get_var_value(PyObject *self, PyObject *args)
{
    int var_num;
    char *var_name;
    char err_string[100];
    long var_value;
    if(!PyArg_ParseTuple(args, "s:get_var_value", &var_name)) {
        PyErr_SetString(PyExc_ValueError, "Invalid arguments passed to get_var_value()");
        return NULL;
    }
    // change the code below to define your variable values
    // This version just assumes A, B, C are given by argv[2], argv[3], argv[4], etc.
    printf("looking up value of %s: ", var_name);
    var_num = var_name[0]-'A';
    if (strlen(var_name) != 1 || var_num < 0 || var_num >= argc-2) {
        printf("%s\n", "unknown");
        snprintf(
            err_string, sizeof(err_string), 
            "Value requested for unknown variable \"%s\"", var_name
        );
        PyErr_SetString(PyExc_ValueError, err_string);
        return NULL;  // will raise exception in Python
    } else {
        var_value = atoi(argv[2+var_num]);
        printf("%ld\n", var_value);
        return Py_BuildValue("l", var_value);
    }
}

// list of methods to be added to the "embedded_methods" module
static PyMethodDef c_methods[] = {
    {"get_var_value", c_get_var_value, METH_VARARGS, // could use METH_O
     "Retrieve the value for the specified variable."},
    {NULL, NULL, 0, NULL} // sentinel for end of list
};

int main(int ac, char *av[])
{
    PyObject *p_module, *p_evaluate, *p_args, *p_result;
    long result;
    const char* expr;

    // cache and evaluate arguments
    argc = ac;
    argv = av;
    if (argc < 2) {
        fprintf(
            stderr, 
            "Usage: %s \"expr\" A B C ...\n"
            "e.g.,  %s \"((A>0) && (B>5 || C > 10))\" 10 9 -1\n", 
            argv[0], argv[0]
        );
        return 1;
    }
    expr = argv[1];

    // initialize Python
    Py_SetProgramName(argv[0]);
    Py_Initialize();

    // attach custom module with get_var_value() function
    Py_InitModule("embedded_methods", c_methods);

    // run script to define evalute() function
    PyRun_SimpleString(py_script);
    if (PyErr_Occurred()) {
        PyErr_Print(); 
        fprintf(stderr, "%s\n", "unable to run Python script");
        return 1;
    }

    // get a reference to the Python evaluate() function (can be reused later)
    // (note: PyRun_SimpleString creates objects in the __main__ module)
    p_module = PyImport_AddModule("__main__");
    p_evaluate = PyObject_GetAttrString(p_module, "evaluate");
    if (!(p_evaluate && PyCallable_Check(p_evaluate))) {
        fprintf(stderr, "%s\n", "Cannot retrieve evaluate() function from __main__ module");
        return 1;
    }

    /*
       Call the Python evaluate() function with the expression to be evaluated.
       The evaluate() function will call c_get_var_value() to obtain any
       variable values needed to evaluate the expression. It will use 
       caching and normal logical short-circuiting to reduce the number 
       of requests.
    */
    p_args = Py_BuildValue("(s)", expr);
    p_result = PyObject_CallObject(p_evaluate, p_args);
    Py_DECREF(p_args);
    if (PyErr_Occurred()) {
        PyErr_Print();
        return 1;
    }
    result = PyInt_AsLong(p_result);
    Py_DECREF(p_result);

    printf("result was %ld\n", result);

    Py_DECREF(p_module);
    Py_DECREF(p_evaluate);
    return 0;
}

I am assuming from this question that

  1. you have a logical expression that depends on the result of various functions;
  2. you use the values of some of those functions more than once (possibly before evaluating this expression, or possibly within this expression), so you want to store their results to avoid calling them twice; and
  3. you would like to evaluate the logical expression, and along the way you would like to retrieve and store values for functions that haven't previously been run, but only enough of them to evaluate the expression (using normal short-circuit behavior).

I mentioned in a different answer that you might do best just to use the built-in short-circuit behavior in C++. To do that and achieve objective 2, you would need to use functions instead of variables in the logical expression. That way you can trigger the calculation of a missing value when the expression needs it.

Below are two approaches to doing that. The first one wraps your slow functions with a general-purpose caching wrapper. The second defines a custom, caching helper for each of your slow functions. After compiling, either of these should be called with your A, B and C values for testing, eg evaluate_cached 10 9 -1 . They will both behave the way you want.

evaluate_cached.cpp

# include <stdio.h>
# include <stdlib.h>
# include <unordered_map>

static char **args;

// define (slow) functions to calculate each of the needed values
int A() {
  printf("Calculating value for A\n");
  return atoi(args[1]);
}

int B() {
  printf("Calculating value for B\n");
  return atoi(args[2]);
}

int C() {
  printf("Calculating value for C\n");
  return atoi(args[3]);
}

typedef int (*int_func)(void);

// wrapper to cache results of other functions
int cached(int_func func) {
    // Create an unordered_map to hold function results
    static std::unordered_map<int_func, int> results;

    if (results.find(func) == results.end()) {
        // function hasn't been called before; call and cache results
        results[func] = func();
    }
    return results[func];
}

int main(int argc, char *argv[])
{
    if (argc!=4) {
        fprintf(stderr, "%s must be called with 3 values for A, B and C.\n", argv[0]);
        return 1;
    } else {
        args = argv;
    }
    // do the evaluation, with short-circuiting
    if (((cached(A)>0) && (cached(B)>5 || cached(C) > 10))) {
        printf("condition was true\n");
    } else {
        printf("condition was false\n");
    }
    return 0;
}

evaluate_helpers.c

# include <stdio.h>
# include <stdlib.h>

static char **args;

// define (slow) functions to calculate each of the needed values
int calculate_A() {
  printf("Calculating value for A\n");
  return atoi(args[1]);
}

int calculate_B() {
  printf("Calculating value for B\n");
  return atoi(args[2]);
}

int calculate_C() {
  printf("Calculating value for C\n");
  return atoi(args[3]);
}

// define functions to retrieve values as needed,
// with caching to avoid double-calculation
int A() {
  static int val, set=0;
  if (!set) val=calculate_A();
  return val;
}
int B() {
  static int val, set=0;
  if (!set) val=calculate_B();
  return val;
}
int C() {
  static int val, set=0;
  if (!set) val=calculate_B();
  return val;
}

int main(int argc, char *argv[])
{
    if (argc!=4) {
        fprintf(stderr, "%s must be called with 3 values for A, B and C.\n", argv[0]);
        return 1;
    } else {
        args = argv;
    }
    // do the evaluation, with short-circuiting
    if (((A()>0) && (B()>5 || C() > 10))) {
        printf("condition was true\n");
    } else {
        printf("condition was false\n");
    }
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM