簡體   English   中英

為列中的行迭代計算值列

[英]Count value column iteratively for rows within column

我有一個 dataframe,看起來像這樣:

      info_version  commits commitdates
18558       17.1.3       42  2017-07-14
20783       17.1.3       57  2017-07-14
20782       17.2.2       57  2017-09-27
18557       17.2.2       42  2017-09-27
18556       17.2.3       42  2017-10-30
20781       17.2.3       57  2017-10-30
20780       17.2.4       57  2017-11-27
18555       17.2.4       42  2017-11-27
20779       17.2.5       57  2018-01-10

我有一個小問題,但不知何故我找不到 function,我想計算從值 42 到最后一個的提交,我想要的 output 是這樣的:

      info_version  commits commitdates    Commit_growth
18558       17.1.3       42  2017-07-14       42
20783       17.1.3       57  2017-07-14       109
20782       17.2.2       57  2017-09-27       166
18557       17.2.2       42  2017-09-27.      208
18556       17.2.3       42  2017-10-30       250
20781       17.2.3       57  2017-10-30       307
20780       17.2.4       57  2017-11-27       364
18555       17.2.4       42  2017-11-27.      406
20779       17.2.5       57  2018-01-10       463

到目前為止,這是我嘗試過的:

data2 = data1[['info_version', 'commits', 'commitdates']].sort_values(by='info_version', ascending=True)
sum_row = data2.sum(axis=0)

但這給了我全部計數。 這似乎很容易,但我有點卡住了。

您可以將sort_valuescumsum一起使用,但 output 與您的不同:

data1["commitdates"]= pd.to_datetime(data1["commitdates"]) #only if not parsed yet
​
data2= (
         data1
            .loc[:, ["info_version", "commits", "commitdates"]]
            .sort_values(by=["info_version", "commitdates"])
            .assign(Commit_growth= lambda x: x["commits"].cumsum())
        )

#Output:

print(data2)

          info_version  commits commitdates  Commit_growth
    18558       17.1.3       42  2017-07-14             42
    20783       17.1.3       57  2017-07-14             99
    20782       17.2.2       57  2017-09-27            156
    18557       17.2.2       42  2017-09-27            198
    18556       17.2.3       42  2017-10-30            240
    20781       17.2.3       57  2017-10-30            297
    20780       17.2.4       57  2017-11-27            354
    18555       17.2.4       42  2017-11-27            396
    20779       17.2.5       57  2018-01-10            453

一個簡單的.cumsum()就足夠了,
因為看起來df已經按info_version排序了

data1['Commit_growth'] = data1['commits'].cumsum()


這是示例代碼:

import pandas as pd

data1 = pd.DataFrame({ 'info_version': ['17.1.3', '17.1.3', '17.2.2', '17.2.2', '17.2.3', '17.2.3', '17.2.4', '17.2.4', '17.2.5'],
                    'commits': [42, 57, 57, 42, 42, 57, 57, 42, 57],
                    'commitdates': ['2017-07-14', '2017-07-14', '2017-09-27', '2017-09-27', '2017-10-30', '2017-10-30', '2017-11-27', '2017-11-27', '2018-01-10']})

data1['Commit_growth'] = data1['commits'].cumsum()
print(data1)

OUTPUT:

  info_version  commits commitdates  Commit_growth
0       17.1.3       42  2017-07-14             42
1       17.1.3       57  2017-07-14             99
2       17.2.2       57  2017-09-27            156
3       17.2.2       42  2017-09-27            198
4       17.2.3       42  2017-10-30            240
5       17.2.3       57  2017-10-30            297
6       17.2.4       57  2017-11-27            354
7       17.2.4       42  2017-11-27            396
8       17.2.5       57  2018-01-10            453

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM