[英]Create new column containing diagonal combination of two columns in a grouped data frame
我有以下數據框
ID Agent Capital_1 Capital_2
0 10 5 Rome Paris
1 10 5 Paris Berlin
2 20 6 Rome Paris
3 20 6 Paris Madrid
df = pd.DataFrame({'ID':[10,10,20,20],
'Agent':[5,5,6,6],
'Capital_1':['Rome','Paris','Rome', 'Paris'],
'Capital_2':['Paris','Berlin','Paris','Madrid']})
對於ID
和Agent
的每個組,我想創建新的Capitals
列,其中包含兩列Capital_1
和Capital_2
的對角線組合,如下所示:
預期 Output:
ID Agent Capital_1 Capital_2 Capitals
0 10 5 Rome Paris RomeBerlin
1 10 5 Paris Berlin RomeBerlin
2 20 6 Rome Paris RomeMadrid
3 20 6 Paris Madrid RomeMadrid
在我的原始數據集中,可能有多個組,這只是一個例子。
是否有捷徑可尋?
我無法將它插入“oneliner”,但它可能會更好,如每個步驟所示:
# grouping by ID/Agent also taking first value from Cap1 and last value from Cap2 (from each group)
df2 = df.groupby(["ID","Agent"]).agg({"Capital_1": "first", "Capital_2": "last"})
# creates result, which we can use for merging
print(df2)
Capital_1 Capital_2
ID Agent
10 5 Rome Berlin
20 6 Rome Madrid
# creating the column for Captilas
df2['Capitals'] = df2['Capital_1']+ df2['Capital_2']
#merging grouped result and old dataframe into the result you need while adding suffix to new duplicated columns
df = df.merge(df2, on=["ID", "Agent"], how="left", suffixes=('', '_Z'))
# getting rid of duplicated colums
df.drop(df.filter(regex='_Z$').columns, axis=1, inplace=True)
print(df)
ID Agent Capital_1 Capital_2 Capitals
0 10 5 Rome Paris RomeBerlin
1 10 5 Paris Berlin RomeBerlin
2 20 6 Rome Paris RomeMadrid
3 20 6 Paris Madrid RomeMadrid
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.