简体   繁体   English

如果值相同,则合并熊猫数据框单元格

[英]Combine pandas dataframe cells in case of identical values

I'm trying to make a new dataframe where, if a 'type' occurs more than once, the contents of the 'country' cells and the 'year' cells of those rows are combined in one row (the 'how' column behaves like the 'type' column: if the types are similar, the hows are as well). 我正在尝试创建一个新的数据框,如果“类型”出现不止一次,则将这些行的“国家”单元格和“年”单元格的内容合并为一行(“如何”列的行为就像“类型”列:如果类型相似,方法也一样)。

My pd dataframe looks as follows, df: 我的pd数据框如下所示,df:

   type   country   year   how
0  't1'    'UK'    '2009'  'S' 
1  't2'    'GER'   '2010'  'D'
2  't2'    'USA'   '2011'  'D'
3  't3'    'AUS'   '2012'  'F'
4  't4'    'CAN'   '2013'  'R'
5  't5'    'SA'    '2014'  'L'
6  't5'    'RU'    '2015'  'L'

df2 should look like this: df2应该如下所示:

   type   country        year         how
0  't1'    'UK'         '2009'        'S' 
1  't2'    'GER, USA'   '2010, 2011'  'D'
2  't3'    'AUS'        '2012'        'F'
3  't4'    'CAN'        '2013'        'R'
4  't5'    'SA, RU'     '2014, 2015'  'L'

I'm pretty sure a group by on 'type' (or type and how) is necessary. 我很确定有必要对“类型”(或类型和方式)进行分组。 Using first() for example removes the second of the similar type rows. 例如,使用first()删除第二个相似类型的行。 Is there some handy way to instead combine the cells (strings)? 有一些方便的方法可以代替合并单元格(字符串)吗? Thanks in advance. 提前致谢。

Use groupby/agg with ', '.join as the aggregator: groupby/agg', '.join用作聚合器:

import pandas as pd
df = pd.DataFrame({'country': ['UK', 'GER', 'USA', 'AUS', 'CAN', 'SA', 'RU'],
 'how': ['S', 'D', 'D', 'F', 'R', 'L', 'L'],
 'type': ['t1', 't2', 't2', 't3', 't4', 't5', 't5'],
 'year': ['2009', '2010', '2011', '2012', '2013', '2014', '2015']})

result = df.groupby(['type','how']).agg(', '.join).reset_index()

yields 产量

  type how   country        year
0   t1   S        UK        2009
1   t2   D  GER, USA  2010, 2011
2   t3   F       AUS        2012
3   t4   R       CAN        2013
4   t5   L    SA, RU  2014, 2015

To get a list in each cell as opposed to a string 在每个单元格而不是字符串中获取列表

def proc_df(df):
    df = df[['country', 'year']]
    return pd.Series(df.T.values.tolist(), df.columns)

df.groupby(['how', 'type']).apply(proc_df)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM