簡體   English   中英

Pandas groupby 根據條件創建新列

[英]Pandas groupby create new column based on a condition

在下表中,我想為由地址相關字段 X、Y、Z(Groupby XYZ)創建的組生成列新區域 如果在代碼列中,如果值為A,則只計算該區域一次,並將剩余區域添加到其他代碼中。

所以對於這個組,新的面積應該是100(A)+200(B)+300(C)= 600。注意不能取和,因為A重復了兩次。 只想將值 A 的一個區域計入總和,而不是全部

在此處輸入圖像描述

獲取上表:

df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]```

所以這行得通,但不確定它是否是最有效的方法。 由於您沒有指定在有多個代碼時要采用哪個代碼,因此我假設它們的area值將保持相同,因此會刪除重復項。

import pandas as pd 

df = pd.DataFrame()
df['X'] = ['222 North St','222 North St','222 North St','222 North St','115 John St','115 John St','115 John St']
df['Y'] = ['Seattle','Seattle','Seattle','Seattle','Chicago','Chicago','Chicago']
df['Z'] = ['WA','WA','WA','WA','IL','IL','IL']
df['code'] = ['A','B','A','C','A','A','B']
df['area'] = [100,200,100,300,200,200,50]

df2 = df.drop_duplicates(subset=['X','Y','Z','code']).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

此外,如果您能夠自己提供上述代碼的第一部分,您將吸引更多人嘗試回答您的問題。

編輯:

# drop duplicates but only for code = A
df_A = df[df['code']=='A'].drop_duplicates(subset=['X','Y','Z','code'])

# groupby and sum now that A only appears once - this creates the 'area sum'
df2 = pd.concat([df[df['code']!='A'],df_A]).groupby(['X','Y','Z']).agg({'area':'sum'}).reset_index()

# merge onto original dataframe
df = pd.merge(df,df2,how='left',on=['X','Y','Z']).rename(columns={'area_x':'area','area_y':'area sum'})

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM