[英]Row wise concatenation and replacing nan with common column values
下面是輸入數據df1
A B C D E F G
Messi Forward Argentina 1 Nan 5 6
Ronaldo Defender Portugal Nan 4 Nan 3
Messi Midfield Argentina Nan 5 Nan 6
Ronaldo Forward Portugal 3 Nan 2 3
Mbappe Forward France 1 3 2 5
下面是預定的output
去向
A B C D E F G
Messi Forward,Midfield Argentina 1 5 5 6
Ronaldo Forward,Defender Portugal 3 4 2 3
Mbappe Forward France 1 3 2 5
我的嘗試:
df.groupby(['A','C'])['B'].agg(','.join).reset_index()
df.fillna(method='ffill')
我們有更好的方法來做到這一點嗎?
您可以通過沒有A,C
的所有列獲得每個組的第一個非缺失值,對於B
聚合,可以通過join
獲得:
d = dict.fromkeys(df.columns.difference(['A','C']), 'first')
d['B'] = ','.join
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d)
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1.0 5.0 5.0 6
1 Ronaldo Portugal Defender,Forward 3.0 4.0 2.0 3
2 Mbappe France Forward 1.0 3.0 2.0 5
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d).convert_dtypes()
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1 5 5 6
1 Ronaldo Portugal Defender,Forward 3 4 2 3
2 Mbappe France Forward 1 3 2 5
對於沒有手動定義列的通用方法,您可以使用列類型來定義是否使用', '.join
或'first'
進行聚合:
from pandas.api.types import is_string_dtype
out = (df.groupby(['A', 'C'], as_index=False)
.agg({c: ', '.join if is_string_dtype(df[c]) else 'first' for c in df})
)
Output:
A B C D E F G
0 Mbappe Forward France 1.0 3.0 2.0 5
1 Messi, Messi Forward, Midfield Argentina, Argentina 1.0 5.0 5.0 6
2 Ronaldo, Ronaldo Defender, Forward Portugal, Portugal 3.0 4.0 2.0 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.