The fastest way to multiply with column based on condition.
Two different Data set.
Applied if condition for the multiplication.
#DF1
#R1= Score
Teacher Teach_Location Student R1
---------------------------------------------
T1 AAS S1 0.33
T1 AAS S1 0.63
T1 AAS S2 0.23
T2 LMN S3 0.73
T1 AAS S2 0.93
T1 AAS S1 0.13
T2 LMN S3 1
.
.
.
.
T24 AGC S97 0.32
# another Data frame like:
F1 F2 F3 F4 Isactive
--------------------------------------------
0.22 0.45 o.67 o.96 1
If Rq is less than 0.25 then multiply with F1 if between 0.25 to 0.5 then F2 same way for F3 and F4. For that, I tried the below function but not working as expected.
def Score(x):
if 0 <= x['R1'] <= 0.25:
return (Weight['F1'] * (x['R1']))
elif 0.25 < x['R1'] <= 0.5:
return (Weight['F2'] * int(x['R1']))
elif 0.5 < x['R1'] <= 0.75:
return (Weight['F3'] * int(x['R1']))
elif 0.75 < x['R1'] <= 1:
return (Weight['F4'] * int(x['R1']))
return ''
DF1['R1'] = DF1.apply(Score, axis=1)
I think the issue is just that you're making x['R1']
into int
, which rounds them all down to 0. Your code seems to work fine if you remove the int(...)
:
def Score(x):
if 0 <= x['R1'] <= 0.25:
return (Weight['F1'] * x['R1'])
elif 0.25 < x['R1'] <= 0.5:
return (Weight['F2'] * x['R1'])
elif 0.5 < x['R1'] <= 0.75:
return (Weight['F3'] * x['R1'])
elif 0.75 < x['R1'] <= 1:
return (Weight['F4'] * x['R1'])
return ''
DF1['R1'] = DF1.apply(Score, axis=1)
# Teacher Teach_Location Student R1
# 0 T1 AAS S1 0.1485
# 1 T1 AAS S1 0.4221
# 2 T1 AAS S2 0.0506
# 3 T2 LMN S3 0.4891
# 4 T1 AAS S2 0.8928
# 5 T1 AAS S1 0.0286
# 6 T2 LMN S3 0.9600
# 7 T24 AGC S97 0.1440
The solution proposed by @tdy is fine but I don't think it's the fastest. Another problem is that you are returning float
or string
which it's very bad. Here I suggest you to return 0
, -1
or np.nan
instead.
import pandas as pd
df = pd.DataFrame({"R1":
[0.33, 0.63, 0.23, 0.73,
0.93, 0.13, 1, 0.32, 2]})
Weight = pd.DataFrame(
{"F1": [0.22],
"F2": [0.45],
"F3": [0.67],
"F4": [0.96],
"isActive":[1]})
def Score(x):
if 0 <= x['R1'] <= 0.25:
return (Weight['F1'] * x['R1'])
elif 0.25 < x['R1'] <= 0.5:
return (Weight['F2'] * x['R1'])
elif 0.5 < x['R1'] <= 0.75:
return (Weight['F3'] * x['R1'])
elif 0.75 < x['R1'] <= 1:
return (Weight['F4'] * x['R1'])
return ''
np.select
Here I suggest you to use np.select
in the following way
import numpy as np
weights = Weight[Weight.columns[:-1]].values[0]
condList = [df["R1"].ge(0) & df["R1"].le(0.25),
df["R1"].gt(0.25) & df["R1"].le(0.5),
df["R1"].gt(5) & df["R1"].le(0.75),
df["R1"].gt(0.75) & df["R1"].le(1)]
choiceList = [df["R1"] * w for w in ws]
out = np.select(condList, choiceList, default=np.nan)
where I return np.nan
when conditions are not satidfied.
%%timeit -n 10 -r 10
out = df.apply(Score, axis=1)
5.62 ms ± 1.44 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
np.select
%%timeit -n 10 -r 10
condList = [df["R1"].ge(0) & df["R1"].le(0.25),
df["R1"].gt(0.25) & df["R1"].le(0.5),
df["R1"].gt(5) & df["R1"].le(0.75),
df["R1"].gt(0.75) & df["R1"].le(1)]
choiceList = [df["R1"] * w for w in ws]
out = np.select(condList, choiceList, default=np.nan)
3.01 ms ± 859 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
If you have a bigger dataframe. Let say 1_000 times bigger
df = pd.concat([df for i in range(1_000)], ignore_index=True)
and run the timing again you'll get
2.58 s ± 82 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
versus
3.05 ms ± 851 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
So with small datasets it doesn't change that much but with a dataframe with 9.000 rows the np.select
solution is ~800x faster.
Another fast option is to use pandas cut or qcut :
df = pd.DataFrame([{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.33},
{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.63},
{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.23},
{'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 0.73},
{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.93},
{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.13},
{'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 1.0},
{'Teacher': 'T24', 'Teach_Location': 'AGC', 'Student': 'S97', 'R1': 0.32}])
weights = pd.DataFrame([{'F1': 0.22,
'F2': 0.45,
'F3': 0.67,
'F4': 0.96,
'Isactiv': 1}])
(df.assign(cut = pd.qcut(df.R1,
q = 4,
labels = ['F1', 'F2', 'F3', 'F4'])
)
.merge(weights.T,
left_on='cut',
right_index=True,
how='left')
.assign(product = lambda df: df.R1.mul(df.iloc[:, -1]))
)
Teacher Teach_Location Student R1 cut 0 product
0 T1 AAS S1 0.33 F2 0.45 0.1485
1 T1 AAS S1 0.63 F3 0.67 0.4221
2 T1 AAS S2 0.23 F1 0.22 0.0506
3 T2 LMN S3 0.73 F3 0.67 0.4891
4 T1 AAS S2 0.93 F4 0.96 0.8928
5 T1 AAS S1 0.13 F1 0.22 0.0286
6 T2 LMN S3 1.00 F4 0.96 0.9600
7 T24 AGC S97 0.32 F2 0.45 0.1440
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.