简体   繁体   中英

Python perform operation in string

So I'm trying to pass a variable operation (user defined) into a function and am having trouble trying to find a good way of doing it. All I can think of to do is hard code all the options into the function like the following:

def DoThings(Conditions):
import re
import pandas as pd
d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
     'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df

for Condition in Conditions:
    # Split the condition into two parts
    SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)

    # If the right side of the conditional statement is a number convert it to a float
    if SplitCondition[1].isdigit():
        SplitCondition[1] = float(SplitCondition[1])

    # Perform the condition specified
    if "<=" in Condition:
        df = df[df[SplitCondition[0]]<=SplitCondition[1]]
        print "one"
    elif ">=" in Condition:
        df = df[df[SplitCondition[0]]>=SplitCondition[1]]
        print "two"
    elif "!=" in Condition:
        df = df[df[SplitCondition[0]]!=SplitCondition[1]]
        print "three"
    elif "<" in Condition:
        df = df[df[SplitCondition[0]]<=SplitCondition[1]]
        print "four"
    elif ">" in Condition:
        df = df[df[SplitCondition[0]]>=SplitCondition[1]]
        print "five"
    elif "=" in Condition:
        df = df[df[SplitCondition[0]]==SplitCondition[1]]
        print "six"
return df

# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions)   # Call the function

print df

Which results in this:

   legnth  time
a       4     1
b       5     2
c       6     3
d       7     4
five
one
   legnth  time
c       6     3

This is all well and good and everything, but I'm wondering if there is a better or more efficient way of passing conditions into functions without writing all the if statements possible out. Any ideas?

SOLUTION:

def DoThings(Conditions):
    import re
    import pandas as pd
    d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
         'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
    df = pd.DataFrame(d)
    print df

    for Condition in Conditions:
        # Split the condition into two parts
        SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)

        # If the right side of the conditional statement is a number convert it to a float
        if SplitCondition[1].isdigit():
            SplitCondition[1] = float(SplitCondition[1])

        import operator
        ops = {'<=': operator.le, '>=': operator.ge, '!=': operator.ne, '<': operator.lt, '>': operator.gt, '=': operator.eq}
        cond = re.findall(r'<=|>=|!=|<|>|=', Condition)
        df = df[ops[cond[0]](df[SplitCondition[0]],SplitCondition[1])]

    return df



# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions)   # Call the function

print df

Output:

   legnth  time
a       4     1
b       5     2
c       6     3
d       7     4
   legnth  time
c       6     3

You can access the built-in operators via the operator module, and then build a table mapping your operator names to the built-in ones, like in this cut-down example:

import operator
ops = {'<=': operator.le, '>=': operator.ge}

In [3]: ops['>='](2, 1)
Out[3]: True

You can use masking to do this kind of operation (you will find it a lot faster):

In [21]: df[(df.legnth <= 6) & (df.time > 2)]
Out[21]:
   legnth  time
c       6     3

In [22]: df[(df.legnth <= 6) & (df.time >= 2)]
Out[22]:
   legnth  time
b       5     2
c       6     3

Note: there's a bug in your implementation, since b should not be included in your query.

You can also do or (using | ) operations, which work as you would expect:

In [23]: df[(df.legnth == 4) | (df.time == 4)]
Out[23]:
   legnth  time
a       4     1
d       7     4

In pandas==0.13 (not sure when the release for that will be... 0.12 just came out) you'll be able to do the following, all of which are equivalent:

res = df.query('(legnth == 4) | (time == 4)')
res = df.query('legnth == 4 | time == 4')
res = df.query('legnth == 4 or time == 4')

and my personal favorite

res = df['legnth == 4 or time == 4']

query and __getitem__ both accept an arbitrary boolean expression and automatically "prefix" the calling frame instance on each variable name in the expression (you can also use locals and globals as well). This allows you to 1) express queries a bit more succinctly than typing df. in front of everything 2) express queries using syntax that, let's face it, looks better than ugly bitwise operators, 3) is potentially much faster than the "pure" Python equivalent if you have huge frames and a very complex expression, and finally 4) allows you to pass the same query to multiple frames (after all, it is a string) with a subset of columns in common.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM