将数据框中的数据分组以针对Pandas / Python中的唯一ID生成列表

Question

Hi I am using the pandas/python and have a dataframe along the following lines: 嗨，我正在使用pandas / python，并具有以下几行的数据框：

21627   red
21627   green
21627   red
21627   blue
21627   purple
21628   yellow
21628   red
21628   green
21629   red
21629   red

Which I want to reduce to: 我想简化为：

21627   red, green, blue, purple
21628   yellow, red, green
21629   red

Whats the best way of doing this (and collapsing all values in lists to unique values)? 最好的方法是什么（将列表中的所有值折叠为唯一值）？

Also, if I wanted to keep the redundancy: 另外，如果我想保留冗余：

21627   red, green, red, blue, purple
21628   yellow, red, green
21629   red, red

Whats the best way of achieving this? 实现此目标的最佳方法是什么？

Thanks in advance for any help. 在此先感谢您的帮助。

Answer 1

If you really wanted to do this you could use a groupby apply: 如果您确实想这样做，可以使用groupby apply：

In [11]: df.groupby('id').apply(lambda x: list(set(x['colours'])))
Out[11]: 
id
21627    [blue, purple, green, red]
21628          [green, red, yellow]
21629                         [red]
dtype: object

In [12]: df.groupby('id').apply(lambda x: list(x['colours']))
Out[12]: 
id
21627    [red, green, red, blue, purple]
21628               [yellow, red, green]
21629                         [red, red]
dtype: object

However, DataFrames containing lists are not particularly efficient. 但是，包含列表的DataFrame并不是特别有效。

Pivot table gets you a more useful DataFrame: 数据透视表为您提供了更有用的DataFrame：

In [21]: df.pivot_table(rows='id', cols='colours', aggfunc=len, fill_value=0)
Out[21]: 
colours  blue  green  purple  red  yellow
id                                       
21627       1      1       1    2       0
21628       0      1       0    1       1
21629       0      0       0    2       0

My favourite function get_dummies lets you do it but not as elegantly or efficiently (but I'll keep this original, if crazy, suggestion): 我最喜欢的函数get_dummies可以使您做到这一点，但是却不那么优雅或有效（但如果有任何建议，我会保留原来的建议）：

In [22]: pd.get_dummies(df.set_index('id')['colours']).reset_index().groupby('id').sum()
Out[22]: 
       blue  green  purple  red  yellow
id                                     
21627     1      1       1    2       0
21628     0      1       0    1       1
21629     0      0       0    2       0

Answer 2

Here's another way; 这是另一种方式； Though @Andy's a bit more intuitve 虽然@Andy有点直觉

In [24]: df.groupby('id').apply(
              lambda x: x['color'].value_counts()).unstack().fillna(0)
Out[24]: 
       blue  green  purple  red  yellow
id                                     
21627     1      1       1    2       0
21628     0      1       0    1       1
21629     0      0       0    2       0

将数据框中的数据分组以针对Pandas / Python中的唯一ID生成列表

问题描述

2 个解决方案

解决方案1
7 已采纳 2013-08-22 13:33:23

解决方案2
2 2013-08-22 14:03:04

将数据框中的数据分组以针对Pandas / Python中的唯一ID生成列表

问题描述

2 个解决方案

解决方案1 7 已采纳 2013-08-22 13:33:23

解决方案2 2 2013-08-22 14:03:04

解决方案1
7 已采纳 2013-08-22 13:33:23

解决方案2
2 2013-08-22 14:03:04