為列中的行迭代計算值列

Question

我有一個 dataframe，看起來像這樣：

      info_version  commits commitdates
18558       17.1.3       42  2017-07-14
20783       17.1.3       57  2017-07-14
20782       17.2.2       57  2017-09-27
18557       17.2.2       42  2017-09-27
18556       17.2.3       42  2017-10-30
20781       17.2.3       57  2017-10-30
20780       17.2.4       57  2017-11-27
18555       17.2.4       42  2017-11-27
20779       17.2.5       57  2018-01-10

我有一個小問題，但不知何故我找不到 function，我想計算從值 42 到最后一個的提交，我想要的 output 是這樣的：

      info_version  commits commitdates    Commit_growth
18558       17.1.3       42  2017-07-14       42
20783       17.1.3       57  2017-07-14       109
20782       17.2.2       57  2017-09-27       166
18557       17.2.2       42  2017-09-27.      208
18556       17.2.3       42  2017-10-30       250
20781       17.2.3       57  2017-10-30       307
20780       17.2.4       57  2017-11-27       364
18555       17.2.4       42  2017-11-27.      406
20779       17.2.5       57  2018-01-10       463

到目前為止，這是我嘗試過的：

data2 = data1[['info_version', 'commits', 'commitdates']].sort_values(by='info_version', ascending=True)
sum_row = data2.sum(axis=0)

但這給了我全部計數。 這似乎很容易，但我有點卡住了。

Answer 1

您可以將sort_values與cumsum一起使用，但 output 與您的不同：

data1["commitdates"]= pd.to_datetime(data1["commitdates"]) #only if not parsed yet

data2= (
         data1
            .loc[:, ["info_version", "commits", "commitdates"]]
            .sort_values(by=["info_version", "commitdates"])
            .assign(Commit_growth= lambda x: x["commits"].cumsum())
        )

＃Output：

print(data2)

          info_version  commits commitdates  Commit_growth
    18558       17.1.3       42  2017-07-14             42
    20783       17.1.3       57  2017-07-14             99
    20782       17.2.2       57  2017-09-27            156
    18557       17.2.2       42  2017-09-27            198
    18556       17.2.3       42  2017-10-30            240
    20781       17.2.3       57  2017-10-30            297
    20780       17.2.4       57  2017-11-27            354
    18555       17.2.4       42  2017-11-27            396
    20779       17.2.5       57  2018-01-10            453

Answer 2

一個簡單的.cumsum()就足夠了，
因為看起來df已經按info_version排序了

data1['Commit_growth'] = data1['commits'].cumsum()

這是示例代碼：

import pandas as pd

data1 = pd.DataFrame({ 'info_version': ['17.1.3', '17.1.3', '17.2.2', '17.2.2', '17.2.3', '17.2.3', '17.2.4', '17.2.4', '17.2.5'],
                    'commits': [42, 57, 57, 42, 42, 57, 57, 42, 57],
                    'commitdates': ['2017-07-14', '2017-07-14', '2017-09-27', '2017-09-27', '2017-10-30', '2017-10-30', '2017-11-27', '2017-11-27', '2018-01-10']})

data1['Commit_growth'] = data1['commits'].cumsum()
print(data1)

OUTPUT：

  info_version  commits commitdates  Commit_growth
0       17.1.3       42  2017-07-14             42
1       17.1.3       57  2017-07-14             99
2       17.2.2       57  2017-09-27            156
3       17.2.2       42  2017-09-27            198
4       17.2.3       42  2017-10-30            240
5       17.2.3       57  2017-10-30            297
6       17.2.4       57  2017-11-27            354
7       17.2.4       42  2017-11-27            396
8       17.2.5       57  2018-01-10            453

為列中的行迭代計算值列

問題描述

2 個解決方案

解決方案1
2 2022-11-26 15:15:22

＃Output：

解決方案2
2 已采納 2022-11-26 15:19:14

為列中的行迭代計算值列

問題描述

2 個解決方案

解決方案1 2 2022-11-26 15:15:22

＃Output：

解決方案2 2 已采納 2022-11-26 15:19:14

解決方案1
2 2022-11-26 15:15:22

解決方案2
2 已采納 2022-11-26 15:19:14