[英]Count value column iteratively for rows within column
我有一個 dataframe,看起來像這樣:
info_version commits commitdates
18558 17.1.3 42 2017-07-14
20783 17.1.3 57 2017-07-14
20782 17.2.2 57 2017-09-27
18557 17.2.2 42 2017-09-27
18556 17.2.3 42 2017-10-30
20781 17.2.3 57 2017-10-30
20780 17.2.4 57 2017-11-27
18555 17.2.4 42 2017-11-27
20779 17.2.5 57 2018-01-10
我有一個小問題,但不知何故我找不到 function,我想計算從值 42 到最后一個的提交,我想要的 output 是這樣的:
info_version commits commitdates Commit_growth
18558 17.1.3 42 2017-07-14 42
20783 17.1.3 57 2017-07-14 109
20782 17.2.2 57 2017-09-27 166
18557 17.2.2 42 2017-09-27. 208
18556 17.2.3 42 2017-10-30 250
20781 17.2.3 57 2017-10-30 307
20780 17.2.4 57 2017-11-27 364
18555 17.2.4 42 2017-11-27. 406
20779 17.2.5 57 2018-01-10 463
到目前為止,這是我嘗試過的:
data2 = data1[['info_version', 'commits', 'commitdates']].sort_values(by='info_version', ascending=True)
sum_row = data2.sum(axis=0)
但這給了我全部計數。 這似乎很容易,但我有點卡住了。
您可以將sort_values
與cumsum
一起使用,但 output 與您的不同:
data1["commitdates"]= pd.to_datetime(data1["commitdates"]) #only if not parsed yet
data2= (
data1
.loc[:, ["info_version", "commits", "commitdates"]]
.sort_values(by=["info_version", "commitdates"])
.assign(Commit_growth= lambda x: x["commits"].cumsum())
)
print(data2)
info_version commits commitdates Commit_growth
18558 17.1.3 42 2017-07-14 42
20783 17.1.3 57 2017-07-14 99
20782 17.2.2 57 2017-09-27 156
18557 17.2.2 42 2017-09-27 198
18556 17.2.3 42 2017-10-30 240
20781 17.2.3 57 2017-10-30 297
20780 17.2.4 57 2017-11-27 354
18555 17.2.4 42 2017-11-27 396
20779 17.2.5 57 2018-01-10 453
一個簡單的.cumsum()
就足夠了,
因為看起來df
已經按info_version
排序了
data1['Commit_growth'] = data1['commits'].cumsum()
這是示例代碼:
import pandas as pd
data1 = pd.DataFrame({ 'info_version': ['17.1.3', '17.1.3', '17.2.2', '17.2.2', '17.2.3', '17.2.3', '17.2.4', '17.2.4', '17.2.5'],
'commits': [42, 57, 57, 42, 42, 57, 57, 42, 57],
'commitdates': ['2017-07-14', '2017-07-14', '2017-09-27', '2017-09-27', '2017-10-30', '2017-10-30', '2017-11-27', '2017-11-27', '2018-01-10']})
data1['Commit_growth'] = data1['commits'].cumsum()
print(data1)
OUTPUT:
info_version commits commitdates Commit_growth
0 17.1.3 42 2017-07-14 42
1 17.1.3 57 2017-07-14 99
2 17.2.2 57 2017-09-27 156
3 17.2.2 42 2017-09-27 198
4 17.2.3 42 2017-10-30 240
5 17.2.3 57 2017-10-30 297
6 17.2.4 57 2017-11-27 354
7 17.2.4 42 2017-11-27 396
8 17.2.5 57 2018-01-10 453
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.