简体   繁体   中英

Combine pandas dataframe cells in case of identical values

I'm trying to make a new dataframe where, if a 'type' occurs more than once, the contents of the 'country' cells and the 'year' cells of those rows are combined in one row (the 'how' column behaves like the 'type' column: if the types are similar, the hows are as well).

My pd dataframe looks as follows, df:

   type   country   year   how
0  't1'    'UK'    '2009'  'S' 
1  't2'    'GER'   '2010'  'D'
2  't2'    'USA'   '2011'  'D'
3  't3'    'AUS'   '2012'  'F'
4  't4'    'CAN'   '2013'  'R'
5  't5'    'SA'    '2014'  'L'
6  't5'    'RU'    '2015'  'L'

df2 should look like this:

   type   country        year         how
0  't1'    'UK'         '2009'        'S' 
1  't2'    'GER, USA'   '2010, 2011'  'D'
2  't3'    'AUS'        '2012'        'F'
3  't4'    'CAN'        '2013'        'R'
4  't5'    'SA, RU'     '2014, 2015'  'L'

I'm pretty sure a group by on 'type' (or type and how) is necessary. Using first() for example removes the second of the similar type rows. Is there some handy way to instead combine the cells (strings)? Thanks in advance.

Use groupby/agg with ', '.join as the aggregator:

import pandas as pd
df = pd.DataFrame({'country': ['UK', 'GER', 'USA', 'AUS', 'CAN', 'SA', 'RU'],
 'how': ['S', 'D', 'D', 'F', 'R', 'L', 'L'],
 'type': ['t1', 't2', 't2', 't3', 't4', 't5', 't5'],
 'year': ['2009', '2010', '2011', '2012', '2013', '2014', '2015']})

result = df.groupby(['type','how']).agg(', '.join).reset_index()

yields

  type how   country        year
0   t1   S        UK        2009
1   t2   D  GER, USA  2010, 2011
2   t3   F       AUS        2012
3   t4   R       CAN        2013
4   t5   L    SA, RU  2014, 2015

To get a list in each cell as opposed to a string

def proc_df(df):
    df = df[['country', 'year']]
    return pd.Series(df.T.values.tolist(), df.columns)

df.groupby(['how', 'type']).apply(proc_df)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM