Combine pandas dataframe cells in case of identical values

Question

I'm trying to make a new dataframe where, if a 'type' occurs more than once, the contents of the 'country' cells and the 'year' cells of those rows are combined in one row (the 'how' column behaves like the 'type' column: if the types are similar, the hows are as well).

My pd dataframe looks as follows, df:

   type   country   year   how
0  't1'    'UK'    '2009'  'S' 
1  't2'    'GER'   '2010'  'D'
2  't2'    'USA'   '2011'  'D'
3  't3'    'AUS'   '2012'  'F'
4  't4'    'CAN'   '2013'  'R'
5  't5'    'SA'    '2014'  'L'
6  't5'    'RU'    '2015'  'L'

df2 should look like this:

   type   country        year         how
0  't1'    'UK'         '2009'        'S' 
1  't2'    'GER, USA'   '2010, 2011'  'D'
2  't3'    'AUS'        '2012'        'F'
3  't4'    'CAN'        '2013'        'R'
4  't5'    'SA, RU'     '2014, 2015'  'L'

I'm pretty sure a group by on 'type' (or type and how) is necessary. Using first() for example removes the second of the similar type rows. Is there some handy way to instead combine the cells (strings)? Thanks in advance.

Answer 1

Use groupby/agg with ', '.join as the aggregator:

import pandas as pd
df = pd.DataFrame({'country': ['UK', 'GER', 'USA', 'AUS', 'CAN', 'SA', 'RU'],
 'how': ['S', 'D', 'D', 'F', 'R', 'L', 'L'],
 'type': ['t1', 't2', 't2', 't3', 't4', 't5', 't5'],
 'year': ['2009', '2010', '2011', '2012', '2013', '2014', '2015']})

result = df.groupby(['type','how']).agg(', '.join).reset_index()

yields

  type how   country        year
0   t1   S        UK        2009
1   t2   D  GER, USA  2010, 2011
2   t3   F       AUS        2012
3   t4   R       CAN        2013
4   t5   L    SA, RU  2014, 2015

Answer 2

To get a list in each cell as opposed to a string

def proc_df(df):
    df = df[['country', 'year']]
    return pd.Series(df.T.values.tolist(), df.columns)

df.groupby(['how', 'type']).apply(proc_df)

Combine pandas dataframe cells in case of identical values

Question

2 answers

solution1
3 2016-08-02 19:15:23

solution2
0 2016-08-02 19:23:06

Combine pandas dataframe cells in case of identical values

Question

2 answers

solution1 3 2016-08-02 19:15:23

solution2 0 2016-08-02 19:23:06

solution1
3 2016-08-02 19:15:23

solution2
0 2016-08-02 19:23:06