![](/img/trans.png)
[英]Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas
[英]pandas: Calculate new values in the new column from multiple criteria apply from multiple columns without looping
我的數據在以下數據框中
df = pd.DataFrame({'AccID':['001','001','001','002','002','003'],
'AccTypes':['A','B','C','A','B','C'],
'Status':['Closed','Active','Active','Active','Closed','Active'],
'Years':[5,15,10,20,25,30]})
AccID AccTypes Status Years
001 A Closed 5
001 B Active 15
001 C Active 10
002 A Active 20
002 B Closed 25
003 C Active 30
我想創建另一個名為“ ActiveYears”的列,每個列都是給定活動AccID的最大活動年份 , 而不管AccTypes如何 。 預期的輸出如下:
AccID AccTypes Status Years ActiveYears Explanations
001 A Closed 5 5 # Status = Closed, we set ActiveYears = Years
001 B Active 15 15 # Status = Active, we select the maximum year of AccID = 001 with active status
001 C Active 10 15 # Status = Active, we select the maximum year of AccID = 001 with active status
002 A Active 20 20 # Status = Active, we select the maximum year of AccID = 002 with active status
002 B Closed 25 20 # Status = Closed, we set ActiveYears = Years
003 C Active 30 30 # Status = Active, we select the maximum year of AccID = 003 with active status
我可以通過循環來做到這一點,但這還不夠優雅。 我可以知道比循環更好的方法嗎? 謝謝。
您可以使用以下內容:
首先處理Closed
狀態:
df.loc[df.Status == 'Closed','ActiveYears'] = df.loc[df.Status == 'Closed','Years']
使用groupby轉換來處理活動:
df.loc[df.Status == 'Active', 'ActiveYears'] = df[df.Status == 'Active'].groupby('AccID')['Years'].transform(max)
print(df)
AccID AccTypes Status Years ActiveYears
0 001 A Closed 5 5.0
1 001 B Active 15 15.0
2 001 C Active 10 15.0
3 002 A Active 20 20.0
4 002 B Closed 25 25.0
5 003 C Active 30 30.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.