简体   繁体   中英

Dataframe - how to run calculations without using for loop?

I have a pandas DataFrame

df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})
df
    A   B  C
0   10  20 20
1   20  30 30
2   30  10 10

and another ndarray w = array([0.2, 0.3, 0.4])

how do I add column D such that its value is dot product of each row and w

ie the value for D[0] will be np.dot(df.iloc[0],w) = 16

likewise, value for D[1] is 25 ( np.dot(df.iloc[1],w) = 25 .

(I am thinking apply() function but not sure how to use it, using for loop might be inefficient)

thanks,

You can do that by using the apply over rows ( axis = 1 ) from pandas.DataFrame

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})
>>> w = np.array([0.2, 0.3, 0.4])
>>> df["D"] = df.apply(lambda p: np.dot(p.values, w), axis=1)
>>> df
    A   B   C     D
0  10  20  20  16.0
1  20  30  30  25.0
2  30  10  10  13.0

Although, for efficiency sake, you probably are better off turning the dataframe into a ndarray, and use matrix multiplication with matmul from numpy.

df["D"] = np.matmul(df.values, w)

You can also use a vectorize approach exploiting numpy broadcast :

df['D'] = np.sum(df.to_numpy() * w), axis=1)
'''
.to_numpy() is from version 0.24 if I remember correctly, before use .values
'''

df
    A   B   C     D
0  10  20  20  16.0
1  20  30  30  25.0
2  30  10  10  13.0

Doing perfomance analysis in spyder editor using %timeit , here what I got ordered from slowest to fastest:

%timeit (df * w).sum(axis=1)
2.15 ms ± 590 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.apply(lambda p: np.dot(p.values, w), axis=1)
900 µs ± 76.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.sum((df.to_numpy() * w), axis=1)
19.2 µs ± 481 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM