[英]Row wise concatenation and replacing nan with common column values
Below is the input data df1下面是输入数据df1
A B C D E F G
Messi Forward Argentina 1 Nan 5 6
Ronaldo Defender Portugal Nan 4 Nan 3
Messi Midfield Argentina Nan 5 Nan 6
Ronaldo Forward Portugal 3 Nan 2 3
Mbappe Forward France 1 3 2 5
Below is the intended output下面是预定的output
df去向
A B C D E F G
Messi Forward,Midfield Argentina 1 5 5 6
Ronaldo Forward,Defender Portugal 3 4 2 3
Mbappe Forward France 1 3 2 5
My try:我的尝试:
df.groupby(['A','C'])['B'].agg(','.join).reset_index()
df.fillna(method='ffill')
Do we have a better way to do this?我们有更好的方法来做到这一点吗?
You can get first non missing values per groups by all columns without A,C
and for B
aggregate by join
:您可以通过没有A,C
的所有列获得每个组的第一个非缺失值,对于B
聚合,可以通过join
获得:
d = dict.fromkeys(df.columns.difference(['A','C']), 'first')
d['B'] = ','.join
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d)
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1.0 5.0 5.0 6
1 Ronaldo Portugal Defender,Forward 3.0 4.0 2.0 3
2 Mbappe France Forward 1.0 3.0 2.0 5
df1 = df.groupby(['A','C'], sort=False, as_index=False).agg(d).convert_dtypes()
print (df1)
A C B D E F G
0 Messi Argentina Forward,Midfield 1 5 5 6
1 Ronaldo Portugal Defender,Forward 3 4 2 3
2 Mbappe France Forward 1 3 2 5
For a generic method without manual definition of the columns, you can use the columns types to define whether to aggregate with ', '.join
or 'first'
:对于没有手动定义列的通用方法,您可以使用列类型来定义是否使用', '.join
或'first'
进行聚合:
from pandas.api.types import is_string_dtype
out = (df.groupby(['A', 'C'], as_index=False)
.agg({c: ', '.join if is_string_dtype(df[c]) else 'first' for c in df})
)
Output: Output:
A B C D E F G
0 Mbappe Forward France 1.0 3.0 2.0 5
1 Messi, Messi Forward, Midfield Argentina, Argentina 1.0 5.0 5.0 6
2 Ronaldo, Ronaldo Defender, Forward Portugal, Portugal 3.0 4.0 2.0 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.