I have a dataframe with some columns. I'd like to apply some transformation to one column and use it as a weight for computing a weighted sum of the other columns. The issue is the way I'm doing it is currently taking too long. Is there a faster way to do this?
I'm currently calculating a new column, transposing, and using df.dot
as suggested by almost all answers . The issue is that I have an extremely large dataframe and so this method is taking a long time.
For example, given the following df
col1 col2 col3
0.1 0.2 0.3
1.4 1.5 1.6
1.9 1.8 1.7
I create a new column, weights, that is 1/col3
col1 col2 col3 weight
0.1 0.2 0.3 3.333
1.4 1.5 1.6 0.625
1.9 1.8 1.7 0.588
and then I transpose and df.dot
against the weight to get
col1 col2
2.32 2.66
I check linked answers and there is not usednp.dot
, but DataFrame.dot
, I hope this should be faster , but if use large DataFrames without huge RAM, it should be still slow:
w = 1 / df.col3
arr = np.dot(df.to_numpy().T, w.to_numpy())
df1 = pd.DataFrame([arr], columns=df.columns)
print (df1)
col1 col2 col3
0 2.32598 2.66299 3.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.