简体   繁体   English

根据条件 pandas 重置累积和并返回其他累积和

[英]reset cumulative sum base on condition pandas and return other cumulative sum

I have this dataframe -我有这个 dataframe -

    counter  duration amount
0         1      0.08  1,235
1         2      0.36  1,170
2         3      1.04  1,222
3         4      0.81  1,207
4         5      3.99  1,109
5         6      1.20  1,261
6         7      4.24  1,068
7         8      3.07  1,098
8         9      2.08  1,215
9        10      4.09  1,043
10       11      2.95  1,176
11       12      3.96  1,038
12       13      3.95  1,119
13       14      3.92  1,074
14       15      3.91  1,076
15       16      1.50  1,224
16       17      3.65    962
17       18      3.85  1,039
18       19      3.82  1,062
19       20      3.34    917

I would like to create another column based on the following logic:我想根据以下逻辑创建另一列:

For each row, I want to calculate a running sum of 'duration' but it should be a running sum for the rows that are below the current row (lead and not lag).对于每一行,我想计算“持续时间”的运行总和,但它应该是当前行下方的行的运行总和(领先而不是滞后)。 I would like to stop the calculation when the running sum reaches 5 -> when it reaches 5, I want to return the running sum for 'amount' (with the same logic).我想在运行总和达到 5 时停止计算 -> 当它达到 5 时,我想返回“金额”的运行总和(使用相同的逻辑)。

For instance, for 'counter' 1 it should take the first 4 rows (0.08+0.36+1.04+0.81<5) and then to return 1,235+1,170+1,222+1,207=4834例如,对于“计数器”1,它应该取前 4 行 (0.08+0.36+1.04+0.81<5),然后返回 1,235+1,170+1,222+1,207=4834

for 'counter' 2 it should take only 0.36 + 1.04 + 0.81<5 and to return 1,170+1,222+1,207=3599对于“计数器”2,它应该只需要 0.36 + 1.04 + 0.81<5 并返回 1,170+1,222+1,207=3599

Will appreciate any help!将不胜感激任何帮助!

Let us try build your own logic with loop让我们尝试使用循环构建您自己的逻辑

c = df.duration.values
v=df.amount.values
result = []
lim=5
check = []
for i in range(len(c)):
    total = 0
    value = 0
    for x, y in zip(v[i:],c[i:]):
        total += y
        value += x
        if total >= lim:
            result.append(value-x)
            print(total)
            break
#result
#[4834, 3599, 2429, 2316, 1109, 1261, 1068, 1098, 1215, 1043, 1176, 1038, 1119, 1074, 1076, 1224, 962, 1039, 1062]

I would first go through the 2 columns once for their cumulative sums.我将首先通过 2 列 go 一次以获得它们的累积总和。

cum_amount = df['amount'].cumsum()
cum_duration = df['duration'].cumsum()

Get a list ready for the results为结果准备好清单

results = []

Then loop through each index (equivalent to counter)然后循环遍历每个索引(相当于计数器)

for idx in cum_duration.index:
    # keep only rows within `5` and the max. index is where the required numbers are located 
    wanted_idx = (cum_duration[cum_duration<5]).index.max()

    # read those numbers with the wanted index
    results.append({'idx': idx, 'cum_duration': cum_duration[wanted_idx], 'cum_amount': cum_amount[wanted_idx]})

    # subtract the lag (we need only the leads not the lags)
    cum_amount -= cum_amount[idx]
    cum_duration -= cum_duration[idx]

Finally the result in a DataFrame.最后的结果是 DataFrame。

pd.DataFrame(results)

    idx cum_duration    cum_amount
0   0   2.29    4834.0
1   1   2.21    3599.0
2   2   1.85    2429.0
3   3   4.80    2316.0
4   4   3.99    1109.0
5   5   1.20    1261.0
6   6   4.24    1068.0
7   7   3.07    1098.0
8   8   2.08    1215.0
9   9   4.09    1043.0
10  10  2.95    1176.0
11  11  3.96    1038.0
12  12  3.95    1119.0
13  13  3.92    1074.0
14  14  3.91    1076.0
15  15  1.50    1224.0
16  16  3.65    962.0
17  17  3.85    1039.0
18  18  3.82    1062.0
19  19  3.34    917.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM