So I'm trying to pass a variable operation (user defined) into a function and am having trouble trying to find a good way of doing it. All I can think of to do is hard code all the options into the function like the following:
def DoThings(Conditions):
import re
import pandas as pd
d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df
for Condition in Conditions:
# Split the condition into two parts
SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)
# If the right side of the conditional statement is a number convert it to a float
if SplitCondition[1].isdigit():
SplitCondition[1] = float(SplitCondition[1])
# Perform the condition specified
if "<=" in Condition:
df = df[df[SplitCondition[0]]<=SplitCondition[1]]
print "one"
elif ">=" in Condition:
df = df[df[SplitCondition[0]]>=SplitCondition[1]]
print "two"
elif "!=" in Condition:
df = df[df[SplitCondition[0]]!=SplitCondition[1]]
print "three"
elif "<" in Condition:
df = df[df[SplitCondition[0]]<=SplitCondition[1]]
print "four"
elif ">" in Condition:
df = df[df[SplitCondition[0]]>=SplitCondition[1]]
print "five"
elif "=" in Condition:
df = df[df[SplitCondition[0]]==SplitCondition[1]]
print "six"
return df
# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions) # Call the function
print df
Which results in this:
legnth time
a 4 1
b 5 2
c 6 3
d 7 4
five
one
legnth time
c 6 3
This is all well and good and everything, but I'm wondering if there is a better or more efficient way of passing conditions into functions without writing all the if statements possible out. Any ideas?
SOLUTION:
def DoThings(Conditions):
import re
import pandas as pd
d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df
for Condition in Conditions:
# Split the condition into two parts
SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)
# If the right side of the conditional statement is a number convert it to a float
if SplitCondition[1].isdigit():
SplitCondition[1] = float(SplitCondition[1])
import operator
ops = {'<=': operator.le, '>=': operator.ge, '!=': operator.ne, '<': operator.lt, '>': operator.gt, '=': operator.eq}
cond = re.findall(r'<=|>=|!=|<|>|=', Condition)
df = df[ops[cond[0]](df[SplitCondition[0]],SplitCondition[1])]
return df
# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions) # Call the function
print df
Output:
legnth time
a 4 1
b 5 2
c 6 3
d 7 4
legnth time
c 6 3
You can access the built-in operators via the operator
module, and then build a table mapping your operator names to the built-in ones, like in this cut-down example:
import operator
ops = {'<=': operator.le, '>=': operator.ge}
In [3]: ops['>='](2, 1)
Out[3]: True
You can use masking to do this kind of operation (you will find it a lot faster):
In [21]: df[(df.legnth <= 6) & (df.time > 2)]
Out[21]:
legnth time
c 6 3
In [22]: df[(df.legnth <= 6) & (df.time >= 2)]
Out[22]:
legnth time
b 5 2
c 6 3
Note: there's a bug in your implementation, since b should not be included in your query.
You can also do or (using |
) operations, which work as you would expect:
In [23]: df[(df.legnth == 4) | (df.time == 4)]
Out[23]:
legnth time
a 4 1
d 7 4
In pandas==0.13
(not sure when the release for that will be... 0.12
just came out) you'll be able to do the following, all of which are equivalent:
res = df.query('(legnth == 4) | (time == 4)')
res = df.query('legnth == 4 | time == 4')
res = df.query('legnth == 4 or time == 4')
and my personal favorite
res = df['legnth == 4 or time == 4']
query
and __getitem__
both accept an arbitrary boolean expression and automatically "prefix" the calling frame instance on each variable name in the expression (you can also use locals and globals as well). This allows you to 1) express queries a bit more succinctly than typing df.
in front of everything 2) express queries using syntax that, let's face it, looks better than ugly bitwise operators, 3) is potentially much faster than the "pure" Python equivalent if you have huge frames and a very complex expression, and finally 4) allows you to pass the same query to multiple frames (after all, it is a string) with a subset of columns in common.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.