简体   繁体   English

我如何在 pandas/dataframe 中执行下面的 python 操作

[英]how do I perform the below python operation in pandas/ dataframe

below is my dataframe下面是我的数据框

df = pd.DataFrame({
                   'Year': [2021, 2022, 2023, 2024, 2025],
                    'Tval' : [1, 9, 8, 1, 6]
})

I want to create a new column with output as shown in the snap attached.我想创建一个带有输出的新列,如附加的快照所示。

in snap one multipliers (2.3,1.2,1.3,2.6 and 1.13) are randomly generated.在 snap 一个乘数 (2.3,1.2,1.3,2.6 和 1.13) 是随机生成的。 likewise for snap two and snap three.快照二和快照三也是如此。

whats the most efficient way to perform this operation as its a simplified version of the original problem (which has over 30k rows).作为原始问题(超​​过 30k 行)的简化版本,执行此操作的最有效方法是什么。 Could use loop but its going to be very, very inefficient.可以使用循环,但它会非常非常低效。

在此处输入图片说明

You want the value of each row to be the product of subsequent rows with random values (random values recalculated for each operation).您希望每一行的值是具有随机值(为每个操作重新计算的随机值)的后续行的乘积。 You can do that as follows:你可以这样做:

values = df.sort_index(ascending=False)['Tval']
values = values.expanding().apply(lambda x: np.sum(x*np.random.random(size=len(x))))
df["values"] = values

result:结果:

   Year  Tval     values
0  2021     1  10.342499
1  2022     9  15.595990
2  2023     8  11.491088
3  2024     1   5.447966
4  2025     6   3.689064

Explanation:解释:

  • reverse the row order so expanding operates on all rows for first index, one row for last反转行顺序,因此expanding对第一个索引的所有行进行操作,最后一个行
  • apply expanding() to sum rows of greater or equal index, randomly weighted.应用expanding()对大于或等于索引的行求和,随机加权。 Weights are recalculated each iteration.每次迭代都会重新计算权重。
  • adds "values" to original dataframe (assignment/join is done on index value, no need to sort the series before adding to df)将“值”添加到原始数据帧(分配/连接是在索引值上完成的,在添加到 df 之前无需对系列进行排序)

As a sanity check, remove the random weighting and observe that this reduces to a reverse cumsum operation:作为健全性检查,删除随机权重并观察这会减少为反向累积和操作:

values = df.sort_index(ascending=False)['Tval']
values = values.expanding().apply(sum)
df["values"] = values

A similar solution can be used if the weights need not change between iterations.如果权重在迭代之间不需要改变,则可以使用类似的解决方案。 As one of the other solutions suggest, you could also pre-calculate all the random weights and take an inner product.正如其他解决方案之一所建议的那样,您还可以预先计算所有随机权重并采用内积。 This will be memory inefficient but may be significantly faster, as apply is not vectorized.这将是内存效率低下的,但可能会明显更快,因为apply不是矢量化的。

The operation you're performing is a dot product, where you can account for the decremental use of data by setting weights to 0您正在执行的操作是一个点积,您可以通过将权重设置为 0 来说明数据的递减使用

weights = np.random.rand(5, 5)
weights = np.tril(weights)

print(weights)
[[0.80446016 0.         0.         0.         0.        ]
 [0.38560755 0.45014049 0.         0.         0.        ]
 [0.61068876 0.91918189 0.66418596 0.         0.        ]
 [0.78442001 0.63551564 0.35635216 0.14712083 0.        ]
 [0.54315584 0.20083916 0.28262627 0.01919842 0.58714358]]

The dot product will the first row of weights , multiply it by the values of df["Tval"] , then sum each of those products.点积将第一行weights乘以df["Tval"]的值,然后对每个乘积求和。 Then it will take the 2nd row of weights and do the same thing, however since we set the first value in the 2nd row of weights to 0, we will essentially ignore the first value of df["Tval"] and multiply/sum the rest of the values.然后它将取第二行weights并做同样的事情,但是由于我们将第二行权重中的第一个值设置为 0,我们基本上将忽略df["Tval"]的第一个值并乘以/求和其余的值。 So on and so forth.等等等等。

df["value"] = df["Tval"] @ weights
print(df)
   Year  Tval      value
0  2021     1  19.181775
1  2022     9  11.324420
2  2023     8   7.936429
3  2024     1   5.792162
4  2025     6   5.243747

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM