Fastest way to multiply with column based on condition, python

Question

The fastest way to multiply with column based on condition.

Two different Data set.

Applied if condition for the multiplication.

#DF1
#R1= Score

Teacher  Teach_Location   Student     R1          
---------------------------------------------
    T1         AAS           S1      0.33
    T1         AAS           S1      0.63  
    T1         AAS           S2      0.23
    T2         LMN           S3      0.73
    T1         AAS           S2      0.93
    T1         AAS           S1      0.13
    T2         LMN           S3      1
    .
    .
    .
    .
    T24     AGC              S97     0.32

# another Data frame like:
 F1      F2      F3      F4      Isactive
--------------------------------------------
 0.22   0.45    o.67    o.96        1

If Rq is less than 0.25 then multiply with F1 if between 0.25 to 0.5 then F2 same way for F3 and F4. For that, I tried the below function but not working as expected.

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * (x['R1']))           
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * int(x['R1'])) 
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * int(x['R1'])) 
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * int(x['R1'])) 
    return ''

DF1['R1'] = DF1.apply(Score, axis=1)

Answer 1

I think the issue is just that you're making x['R1'] into int , which rounds them all down to 0. Your code seems to work fine if you remove the int(...) :

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * x['R1'])
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * x['R1'])
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * x['R1'])
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * x['R1'])
    return ''

DF1['R1'] = DF1.apply(Score, axis=1)

#   Teacher Teach_Location Student      R1
# 0      T1            AAS      S1  0.1485
# 1      T1            AAS      S1  0.4221
# 2      T1            AAS      S2  0.0506
# 3      T2            LMN      S3  0.4891
# 4      T1            AAS      S2  0.8928
# 5      T1            AAS      S1  0.0286
# 6      T2            LMN      S3  0.9600
# 7     T24            AGC     S97  0.1440

Answer 2

The solution proposed by @tdy is fine but I don't think it's the fastest. Another problem is that you are returning float or string which it's very bad. Here I suggest you to return 0 , -1 or np.nan instead.

Dataframes

import pandas as pd
df = pd.DataFrame({"R1": 
                   [0.33, 0.63, 0.23, 0.73,
                    0.93, 0.13, 1, 0.32, 2]})

Weight = pd.DataFrame(
    {"F1": [0.22],
     "F2": [0.45],
     "F3": [0.67],
     "F4": [0.96],
     "isActive":[1]})

@tdy's solution

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * x['R1'])
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * x['R1'])
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * x['R1'])
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * x['R1'])
    return ''

Using `np.select`

Here I suggest you to use np.select in the following way

import numpy as np

weights = Weight[Weight.columns[:-1]].values[0]

condList = [df["R1"].ge(0) & df["R1"].le(0.25),
           df["R1"].gt(0.25) & df["R1"].le(0.5),
           df["R1"].gt(5) & df["R1"].le(0.75),
           df["R1"].gt(0.75) & df["R1"].le(1)]

choiceList = [df["R1"] * w for w in ws]


out = np.select(condList, choiceList, default=np.nan)

where I return np.nan when conditions are not satidfied.

Timing

@tdy's solution

%%timeit -n 10 -r 10
out = df.apply(Score, axis=1)

5.62 ms ± 1.44 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

using `np.select`

%%timeit -n 10 -r 10
condList = [df["R1"].ge(0) & df["R1"].le(0.25),
           df["R1"].gt(0.25) & df["R1"].le(0.5),
           df["R1"].gt(5) & df["R1"].le(0.75),
           df["R1"].gt(0.75) & df["R1"].le(1)]

choiceList = [df["R1"] * w for w in ws]

out = np.select(condList, choiceList, default=np.nan)

3.01 ms ± 859 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

If you have a bigger dataframe. Let say 1_000 times bigger

df = pd.concat([df for i in range(1_000)], ignore_index=True)

and run the timing again you'll get

2.58 s ± 82 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

versus

3.05 ms ± 851 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

So with small datasets it doesn't change that much but with a dataframe with 9.000 rows the np.select solution is ~800x faster.

Answer 3

Another fast option is to use pandas cut or qcut :

df = pd.DataFrame([{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.33},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.63},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.23},

                   {'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 0.73},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.93},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.13},

                   {'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 1.0},

                   {'Teacher': 'T24', 'Teach_Location': 'AGC', 'Student': 'S97', 'R1': 0.32}])

weights = pd.DataFrame([{'F1': 0.22, 
                         'F2': 0.45, 
                         'F3': 0.67, 
                         'F4': 0.96, 
                         'Isactiv': 1}])

 (df.assign(cut = pd.qcut(df.R1, 
                          q = 4, 
                          labels = ['F1', 'F2', 'F3', 'F4'])
            )
    .merge(weights.T, 
           left_on='cut', 
           right_index=True, 
           how='left')
    .assign(product = lambda df: df.R1.mul(df.iloc[:, -1]))
  )

  Teacher Teach_Location Student    R1 cut     0  product
0      T1            AAS      S1  0.33  F2  0.45   0.1485
1      T1            AAS      S1  0.63  F3  0.67   0.4221
2      T1            AAS      S2  0.23  F1  0.22   0.0506
3      T2            LMN      S3  0.73  F3  0.67   0.4891
4      T1            AAS      S2  0.93  F4  0.96   0.8928
5      T1            AAS      S1  0.13  F1  0.22   0.0286
6      T2            LMN      S3  1.00  F4  0.96   0.9600
7     T24            AGC     S97  0.32  F2  0.45   0.1440

Fastest way to multiply with column based on condition, python

Question

3 answers

solution1
2 ACCPTED 2021-04-09 19:53:38

solution2
1 2021-04-09 21:07:30

Dataframes

@tdy's solution

Using `np.select`

Timing

@tdy's solution

using `np.select`

solution3
1 2021-04-10 04:23:51

Fastest way to multiply with column based on condition, python

Question

3 answers

solution1 2 ACCPTED 2021-04-09 19:53:38

solution2 1 2021-04-09 21:07:30

Dataframes

@tdy's solution

Using np.select

Timing

@tdy's solution

using np.select

solution3 1 2021-04-10 04:23:51

solution1
2 ACCPTED 2021-04-09 19:53:38

solution2
1 2021-04-09 21:07:30

Using `np.select`

using `np.select`

solution3
1 2021-04-10 04:23:51