简体   繁体   English

如何在 python 中的同一 dataframe 中合并行?

[英]How to merge rows within the same dataframe in python?

I would like to combine rows of the same dataframe, more precisely, to take rows that have the same values in a specific column and create only one row.我想组合相同 dataframe 的行,更准确地说,是在特定列中获取具有相同值的行并仅创建一行。 Here is an example:这是一个例子:

I have the following dataframe:我有以下 dataframe:

te= {'TEAM': ['HC','TC','HC','BC','TC','BC'],
        'A1': [22,25,27,35,31,41],
        'A2': [20,50,70,11,14,12]
        }

df = pd.DataFrame(te,columns= ['TEAM', 'A1', "A2"])

print (df)

 TEAM  A1  A2
0   HC  22  20
1   TC  25  50
2   HC  27  70
3   BC  35  11
4   TC  31  14
5   BC  41  12

and I would like to form a row for the three possibles values of the column TEAM such as the expected output look like this:我想为列TEAM的三个可能值形成一行,例如预期的 output 如下所示:

 TEAM  A1  A2  A1(1)  A2(1)
0   HC  22  20     27     70
1   TC  25  50     31     14
2   BC  35  11     41     12

How can I do that?我怎样才能做到这一点?

It is pivot table with pre-process for columns它是 pivot 表,带有列的预处理

s = df.groupby('TEAM').cumcount()
m = s.astype(bool) * ('('+s.astype(str)+')')
df_out = df.set_index(['TEAM', m]).unstack().sort_index(level=1, axis=1).reset_index()
df_out.columns = df_out.columns.map(lambda x: f'{x[0]}{x[1]}')

Out[268]:
  TEAM  A1  A2  A1(1)  A2(1)
0   BC  35  11     41     12
1   HC  22  20     27     70
2   TC  25  50     31     14

If you know you only have exactlly 2 rows of each "TEAM", you can do:如果您知道每个“团队”只有 2 行,您可以执行以下操作:

df.drop_duplicates('TEAM', keep='first').merge(df.drop_duplicates('TEAM', keep='last'), on='TEAM', suffixes=('', '(1)'))

Output: Output:

     TEAM    A1    A2    A1(1)    A2(1)
0    HC      22    20    27       70 
1    TC      25    50    31       14 
2    BC      35    11    41       12 

Otherwise, you might need to repeat this process in a loop, and clean the newly created columns for "TEAM"s who don't have new values.否则,您可能需要循环重复此过程,并为没有新值的“TEAM”清理新创建的列。

There may be a better way, but this solution scales to an arbitrary number of lines.可能有更好的方法,但这个解决方案可以扩展到任意数量的行。

df['order'] = df.groupby('TEAM').cumcount() + 1
df.set_index(['TEAM','order']).unstack()
#       A1      A2         
#order   1   2   1   2  
#TEAM                       
#BC     35  41  11  12  
#HC     22  27  20  70  
#TC     25  31  50  14  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM