引用 Pandas 中的上一行应用 function 逻辑

Question

I am trying to create a clipped cumsum.我正在尝试创建一个剪辑的 cumsum。 However, the clipping is based on the value of the previous row.但是，裁剪是基于前一行的值。

Below is an example of the data that I am trying to generate.下面是我尝试生成的数据示例。

╔══════╦═══════════════════╦═════════════════════╦════════════╦═══════════╦══════════════════╗
║ Idx  ║ datetime          ║ energy_consumption  ║ energy_pv  ║ spare_pv  ║ cumsum_spare_pv  ║
╠══════╬═══════════════════╬═════════════════════╬════════════╬═══════════╬══════════════════╣
║ 1    ║ 18/09/2021 04:00  ║ 15.5                ║ 0          ║ -15.5     ║ 42.546           ║
║ 2    ║ 18/09/2021 05:00  ║ 15.2                ║ 0          ║ -15.2     ║ 27.346           ║
║ 3    ║ 18/09/2021 06:00  ║ 15.5                ║ 0.667      ║ -14.833   ║ 12.513           ║
║ 4    ║ 18/09/2021 07:00  ║ 15.1                ║ 5.598      ║ -9.502    ║ 3.011            ║
║ 5    ║ 18/09/2021 08:00  ║ 17.1                ║ 10.638     ║ -6.462    ║ 0                ║
║ 6    ║ 18/09/2021 09:00  ║ 17.4                ║ 15.146     ║ -2.254    ║ 0                ║
║ 7    ║ 18/09/2021 10:00  ║ 17.5                ║ 19.44      ║ 1.94      ║ 1.94             ║
║ 8    ║ 18/09/2021 11:00  ║ 18.1                ║ 23.226     ║ 5.126     ║ 7.066            ║
║ 9    ║ 18/09/2021 12:00  ║ 19.5                ║ 26.246     ║ 6.746     ║ 13.812           ║
║ 10   ║ 18/09/2021 13:00  ║ 20.5                ║ 26.864     ║ 6.364     ║ 20.176           ║
║ 11   ║ 18/09/2021 14:00  ║ 19.6                ║ 24.792     ║ 5.192     ║ 25.368           ║
║ 12   ║ 18/09/2021 15:00  ║ 18.4                ║ 18.869     ║ 0.469     ║ 25.837           ║
║ 13   ║ 18/09/2021 16:00  ║ 18.3                ║ 12.319     ║ -5.981    ║ 19.856           ║
║ 14   ║ 18/09/2021 17:00  ║ 19.4                ║ 4.637      ║ -14.763   ║ 5.093            ║
║ 15   ║ 18/09/2021 18:00  ║ 18.7                ║ 0          ║ -18.7     ║ 0                ║
║ 16   ║ 18/09/2021 19:00  ║ 18.8                ║ 0          ║ -18.8     ║ 0                ║
╚══════╩═══════════════════╩═════════════════════╩════════════╩═══════════╩══════════════════╝

I have the energy consumption and PV columns.我有能耗和光伏列。 spare_pv is easy to generate using: spare_pv很容易生成使用：

df['PV'] - df['energy_consumption']

However, I cannot work out how the create the cumsum column.但是，我无法弄清楚如何创建 cumsum 列。 This cannot be done using the pd.Series.cumsum() , since logic is required to clip the value if the line above is 0 .这不能使用pd.Series.cumsum()来完成，因为如果上面的行是0 ，则需要逻辑来裁剪该值。 I have an excel formula that I am currently using to generate this function.我有一个 excel 公式，我目前正在使用它来生成这个 function。 For this example, the columns are:对于此示例，列是：

A: Idx, B: datetime, C: energy_consumption, D: energy_pv, E: spare_pv, F: cumsum_spare_pv

Row 1 would be initially calculated, then row 2 would have the formula:最初将计算第 1 行，然后第 2 行将具有以下公式：

=IF((E2+F1)<=0, 0, E2+F1)

I can do this by creating a list and looping over the DF to append values (similar to the way suggested in here ), but considering this dataset is hundreds of thousands of lines, this will be very slow.我可以通过创建一个列表并将 DF 循环到 append 值来做到这一点（类似于此处建议的方式），但考虑到这个数据集有数十万行，这将非常慢。

Answer 1

As the cumsum is conditional, I dont think there is a way around looping it.由于 cumsum 是有条件的，我认为没有办法循环它。

Instead of using iloc and for this purpose (which is slow) better to do this with numpy instead.与其使用 iloc 并为此目的（这很慢），不如使用 numpy 来代替。

D = df['spare_pv'].values

# D = np.random.random(5000000)-0.5

start_value = 42.546+15.5

new_val = start_value+D[0]
D[0] =  new_val if new_val >0 else 0

for i in range(1,len(D)):
    new_val = D[i]+D[i-1]
    D[i] = new_val if new_val >0 else 0
df['cumsum_spare_pv'] = D

Ive tried this operation on a random 5 million values as well and it completes in 3 secs or so, so should still be ok.我也对随机的 500 万个值尝试了这个操作，它在 3 秒左右完成，所以应该还可以。

引用 Pandas 中的上一行应用 function 逻辑

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-09-08 13:23:00

引用 Pandas 中的上一行应用 function 逻辑

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-09-08 13:23:00

解决方案1
2 已采纳 2022-09-08 13:23:00