[英]Create a column in pandas dataframes based on conditionals
我有一個熊貓數據框,如下所示:
import pandas as pd
import numpy as np
import datetime
# intialise data of lists.
data = {'month' :[2,3,4,5,6,7,2,3,6,5],
'flag': ["A","A","A","A","A","A","B","B","B","B"],
'month1' :[4,4,7,15,11,13,6,5,6,5],
'value' :[100,20,50,10,65,86,24,12,1000,200]
}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df
month flag month1 value
0 2 A 4 100
1 3 A 4 20
2 4 A 7 50
3 5 A 15 10
4 6 A 11 65
5 7 A 13 86
6 2 B 6 24
7 3 B 5 12
8 6 B 6 1000
9 5 B 5 200
現在每個月都有獨特的標志,我想執行以下邏輯
1)創建一個變量“final”並將其設置為0
2)對於每個月,如果month1 <= max(month),則將month == month1 的“final”設置為“final”,來自month1 + 原始月份的值。 例如,
預期輸出:
month flag month1 value Final
0 2 A 4 100 0
1 3 A 4 20 0
2 4 A 7 50 120
3 5 A 15 10 0
4 6 A 11 65 0
5 7 A 13 86 50
6 2 B 6 24 0
7 3 B 5 12 0
8 6 B 6 1000 1024
9 5 B 5 200 212
定義以下函數:
應用於每一行的函數(在當前組中):
def fn(row, tbl, maxMonth): return tbl[tbl.month1 == row.month].value.sum()
應用於每個組的函數:
def fnGrp(grp): return grp.apply(fn, axis=1, tbl=grp, maxMonth=grp.month.max())
然后,要計算最后一列,按標志對df進行分組並將fnGrp應用於每個組並將結果保存在最后一列中:
df['final'] = df.groupby('flag').apply(fnGrp).reset_index(level=0, drop=True)
結果( df添加了列)是:
month flag month1 value final
0 2 A 4 100 0
1 3 A 4 20 0
2 4 A 7 50 120
3 5 A 15 10 0
4 6 A 11 65 0
5 7 A 13 86 50
6 2 B 6 24 0
7 3 B 5 12 0
8 6 B 6 1000 1024
9 5 B 5 200 212
你可以groupby
“標志”和“MONTH1”,並獲得sum
“價值”,然后merge
與此df
加fillna
有這樣0:
new_df = df.merge(df.groupby(['flag', 'month1'])[['value']].sum(),
left_on=['flag','month'], right_index=True,
how='left', suffixes=('','_final'))\
.fillna({'value_final':0})
print (new_df)
month flag month1 value value_final
0 2 A 4 100 0.0
1 3 A 4 20 0.0
2 4 A 7 50 120.0
3 5 A 15 10 0.0
4 6 A 11 65 0.0
5 7 A 13 86 50.0
6 2 B 6 24 0.0
7 3 B 5 12 0.0
8 6 B 6 1000 1024.0
9 5 B 5 200 212.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.