跨行计算数据帧加权和的最快方法

Question

I have a dataframe with some columns.我有一个包含一些列的数据框。 I'd like to apply some transformation to one column and use it as a weight for computing a weighted sum of the other columns.我想对一列应用一些转换，并将其用作计算其他列的加权和的权重。 The issue is the way I'm doing it is currently taking too long.问题是我现在做的方式太长了。 Is there a faster way to do this?有没有更快的方法来做到这一点？

I'm currently calculating a new column, transposing, and using df.dot as suggested by almost all answers .我目前正在计算一个新列，转置，并按照几乎所有答案的建议使用df.dot 。 The issue is that I have an extremely large dataframe and so this method is taking a long time.问题是我有一个非常大的数据框，所以这种方法需要很长时间。

For example, given the following df例如，给定以下 df

col1  col2  col3
 0.1   0.2   0.3
 1.4   1.5   1.6
 1.9   1.8   1.7

I create a new column, weights, that is 1/col3我创建了一个新列，权重，即1/col3

col1  col2  col3  weight
 0.1   0.2   0.3   3.333
 1.4   1.5   1.6   0.625
 1.9   1.8   1.7   0.588

and then I transpose and df.dot against the weight to get然后我对重量进行转置和df.dot以获得

col1  col2
2.32  2.66

Answer 1

I check linked answers and there is not usednp.dot , but DataFrame.dot , I hope this should be faster , but if use large DataFrames without huge RAM, it should be still slow:我检查了链接的答案，没有使用np.dot ，而是DataFrame.dot ，我希望这应该更快，但是如果使用没有巨大 RAM 的大型 DataFrames，它应该仍然很慢：

w = 1 / df.col3
arr = np.dot(df.to_numpy().T, w.to_numpy())

df1 = pd.DataFrame([arr], columns=df.columns)
print (df1)
      col1     col2  col3
0  2.32598  2.66299   3.0

跨行计算数据帧加权和的最快方法

问题描述

1 个解决方案

解决方案1
0 2021-07-27 06:25:23

跨行计算数据帧加权和的最快方法

问题描述

1 个解决方案

解决方案1 0 2021-07-27 06:25:23

解决方案1
0 2021-07-27 06:25:23