如果值相同，则合并熊猫数据框单元格

Question

I'm trying to make a new dataframe where, if a 'type' occurs more than once, the contents of the 'country' cells and the 'year' cells of those rows are combined in one row (the 'how' column behaves like the 'type' column: if the types are similar, the hows are as well). 我正在尝试创建一个新的数据框，如果“类型”出现不止一次，则将这些行的“国家”单元格和“年”单元格的内容合并为一行（“如何”列的行为就像“类型”列：如果类型相似，方法也一样）。

My pd dataframe looks as follows, df: 我的pd数据框如下所示，df：

   type   country   year   how
0  't1'    'UK'    '2009'  'S' 
1  't2'    'GER'   '2010'  'D'
2  't2'    'USA'   '2011'  'D'
3  't3'    'AUS'   '2012'  'F'
4  't4'    'CAN'   '2013'  'R'
5  't5'    'SA'    '2014'  'L'
6  't5'    'RU'    '2015'  'L'

df2 should look like this: df2应该如下所示：

   type   country        year         how
0  't1'    'UK'         '2009'        'S' 
1  't2'    'GER, USA'   '2010, 2011'  'D'
2  't3'    'AUS'        '2012'        'F'
3  't4'    'CAN'        '2013'        'R'
4  't5'    'SA, RU'     '2014, 2015'  'L'

I'm pretty sure a group by on 'type' (or type and how) is necessary. 我很确定有必要对“类型”（或类型和方式）进行分组。 Using first() for example removes the second of the similar type rows. 例如，使用first（）删除第二个相似类型的行。 Is there some handy way to instead combine the cells (strings)? 有一些方便的方法可以代替合并单元格（字符串）吗？ Thanks in advance. 提前致谢。

Answer 1

Use groupby/agg with ', '.join as the aggregator: 将groupby/agg与', '.join用作聚合器：

import pandas as pd
df = pd.DataFrame({'country': ['UK', 'GER', 'USA', 'AUS', 'CAN', 'SA', 'RU'],
 'how': ['S', 'D', 'D', 'F', 'R', 'L', 'L'],
 'type': ['t1', 't2', 't2', 't3', 't4', 't5', 't5'],
 'year': ['2009', '2010', '2011', '2012', '2013', '2014', '2015']})

result = df.groupby(['type','how']).agg(', '.join).reset_index()

yields 产量

  type how   country        year
0   t1   S        UK        2009
1   t2   D  GER, USA  2010, 2011
2   t3   F       AUS        2012
3   t4   R       CAN        2013
4   t5   L    SA, RU  2014, 2015

Answer 2

To get a list in each cell as opposed to a string 在每个单元格而不是字符串中获取列表

def proc_df(df):
    df = df[['country', 'year']]
    return pd.Series(df.T.values.tolist(), df.columns)

df.groupby(['how', 'type']).apply(proc_df)

如果值相同，则合并熊猫数据框单元格

问题描述

2 个解决方案

解决方案1
3 2016-08-02 19:15:23

解决方案2
0 2016-08-02 19:23:06

如果值相同，则合并熊猫数据框单元格

问题描述

2 个解决方案

解决方案1 3 2016-08-02 19:15:23

解决方案2 0 2016-08-02 19:23:06

解决方案1
3 2016-08-02 19:15:23

解决方案2
0 2016-08-02 19:23:06