简体   繁体   中英

Fastest way to multiply with column based on condition, python

The fastest way to multiply with column based on condition.

Two different Data set.

Applied if condition for the multiplication.

#DF1
#R1= Score

Teacher  Teach_Location   Student     R1          
---------------------------------------------
    T1         AAS           S1      0.33
    T1         AAS           S1      0.63  
    T1         AAS           S2      0.23
    T2         LMN           S3      0.73
    T1         AAS           S2      0.93
    T1         AAS           S1      0.13
    T2         LMN           S3      1
    .
    .
    .
    .
    T24     AGC              S97     0.32

# another Data frame like:
 F1      F2      F3      F4      Isactive
--------------------------------------------
 0.22   0.45    o.67    o.96        1

If Rq is less than 0.25 then multiply with F1 if between 0.25 to 0.5 then F2 same way for F3 and F4. For that, I tried the below function but not working as expected.

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * (x['R1']))           
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * int(x['R1'])) 
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * int(x['R1'])) 
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * int(x['R1'])) 
    return ''

DF1['R1'] = DF1.apply(Score, axis=1)

I think the issue is just that you're making x['R1'] into int , which rounds them all down to 0. Your code seems to work fine if you remove the int(...) :

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * x['R1'])
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * x['R1'])
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * x['R1'])
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * x['R1'])
    return ''

DF1['R1'] = DF1.apply(Score, axis=1)

#   Teacher Teach_Location Student      R1
# 0      T1            AAS      S1  0.1485
# 1      T1            AAS      S1  0.4221
# 2      T1            AAS      S2  0.0506
# 3      T2            LMN      S3  0.4891
# 4      T1            AAS      S2  0.8928
# 5      T1            AAS      S1  0.0286
# 6      T2            LMN      S3  0.9600
# 7     T24            AGC     S97  0.1440

The solution proposed by @tdy is fine but I don't think it's the fastest. Another problem is that you are returning float or string which it's very bad. Here I suggest you to return 0 , -1 or np.nan instead.

Dataframes

import pandas as pd
df = pd.DataFrame({"R1": 
                   [0.33, 0.63, 0.23, 0.73,
                    0.93, 0.13, 1, 0.32, 2]})

Weight = pd.DataFrame(
    {"F1": [0.22],
     "F2": [0.45],
     "F3": [0.67],
     "F4": [0.96],
     "isActive":[1]})

@tdy's solution

def Score(x):
    if 0 <= x['R1'] <= 0.25:
        return (Weight['F1'] * x['R1'])
    elif 0.25 < x['R1'] <= 0.5:
        return (Weight['F2'] * x['R1'])
    elif 0.5 < x['R1'] <= 0.75:
        return (Weight['F3'] * x['R1'])
    elif 0.75 < x['R1'] <= 1:
        return (Weight['F4'] * x['R1'])
    return ''

Using np.select

Here I suggest you to use np.select in the following way

import numpy as np

weights = Weight[Weight.columns[:-1]].values[0]

condList = [df["R1"].ge(0) & df["R1"].le(0.25),
           df["R1"].gt(0.25) & df["R1"].le(0.5),
           df["R1"].gt(5) & df["R1"].le(0.75),
           df["R1"].gt(0.75) & df["R1"].le(1)]

choiceList = [df["R1"] * w for w in ws]


out = np.select(condList, choiceList, default=np.nan)

where I return np.nan when conditions are not satidfied.

Timing

@tdy's solution

%%timeit -n 10 -r 10
out = df.apply(Score, axis=1)
5.62 ms ± 1.44 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

using np.select

%%timeit -n 10 -r 10
condList = [df["R1"].ge(0) & df["R1"].le(0.25),
           df["R1"].gt(0.25) & df["R1"].le(0.5),
           df["R1"].gt(5) & df["R1"].le(0.75),
           df["R1"].gt(0.75) & df["R1"].le(1)]

choiceList = [df["R1"] * w for w in ws]

out = np.select(condList, choiceList, default=np.nan)
3.01 ms ± 859 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

If you have a bigger dataframe. Let say 1_000 times bigger

df = pd.concat([df for i in range(1_000)], ignore_index=True)

and run the timing again you'll get

2.58 s ± 82 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

versus

3.05 ms ± 851 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

So with small datasets it doesn't change that much but with a dataframe with 9.000 rows the np.select solution is ~800x faster.

Another fast option is to use pandas cut or qcut :

df = pd.DataFrame([{'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.33},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.63},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.23},

                   {'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 0.73},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S2', 'R1': 0.93},

                   {'Teacher': 'T1', 'Teach_Location': 'AAS', 'Student': 'S1', 'R1': 0.13},

                   {'Teacher': 'T2', 'Teach_Location': 'LMN', 'Student': 'S3', 'R1': 1.0},

                   {'Teacher': 'T24', 'Teach_Location': 'AGC', 'Student': 'S97', 'R1': 0.32}])

weights = pd.DataFrame([{'F1': 0.22, 
                         'F2': 0.45, 
                         'F3': 0.67, 
                         'F4': 0.96, 
                         'Isactiv': 1}])

 (df.assign(cut = pd.qcut(df.R1, 
                          q = 4, 
                          labels = ['F1', 'F2', 'F3', 'F4'])
            )
    .merge(weights.T, 
           left_on='cut', 
           right_index=True, 
           how='left')
    .assign(product = lambda df: df.R1.mul(df.iloc[:, -1]))
  )

  Teacher Teach_Location Student    R1 cut     0  product
0      T1            AAS      S1  0.33  F2  0.45   0.1485
1      T1            AAS      S1  0.63  F3  0.67   0.4221
2      T1            AAS      S2  0.23  F1  0.22   0.0506
3      T2            LMN      S3  0.73  F3  0.67   0.4891
4      T1            AAS      S2  0.93  F4  0.96   0.8928
5      T1            AAS      S1  0.13  F1  0.22   0.0286
6      T2            LMN      S3  1.00  F4  0.96   0.9600
7     T24            AGC     S97  0.32  F2  0.45   0.1440

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM