[英]Grouping data in a dataframe to produce lists against unique ids in Pandas/Python
Hi I am using the pandas/python and have a dataframe along the following lines: 嗨,我正在使用pandas / python,并具有以下几行的数据框:
21627 red
21627 green
21627 red
21627 blue
21627 purple
21628 yellow
21628 red
21628 green
21629 red
21629 red
Which I want to reduce to: 我想简化为:
21627 red, green, blue, purple
21628 yellow, red, green
21629 red
Whats the best way of doing this (and collapsing all values in lists to unique values)? 最好的方法是什么(将列表中的所有值折叠为唯一值)?
Also, if I wanted to keep the redundancy: 另外,如果我想保留冗余:
21627 red, green, red, blue, purple
21628 yellow, red, green
21629 red, red
Whats the best way of achieving this? 实现此目标的最佳方法是什么?
Thanks in advance for any help. 在此先感谢您的帮助。
If you really wanted to do this you could use a groupby apply: 如果您确实想这样做,可以使用groupby apply:
In [11]: df.groupby('id').apply(lambda x: list(set(x['colours'])))
Out[11]:
id
21627 [blue, purple, green, red]
21628 [green, red, yellow]
21629 [red]
dtype: object
In [12]: df.groupby('id').apply(lambda x: list(x['colours']))
Out[12]:
id
21627 [red, green, red, blue, purple]
21628 [yellow, red, green]
21629 [red, red]
dtype: object
However, DataFrames containing lists are not particularly efficient. 但是,包含列表的DataFrame并不是特别有效。
Pivot table gets you a more useful DataFrame: 数据透视表为您提供了更有用的DataFrame:
In [21]: df.pivot_table(rows='id', cols='colours', aggfunc=len, fill_value=0)
Out[21]:
colours blue green purple red yellow
id
21627 1 1 1 2 0
21628 0 1 0 1 1
21629 0 0 0 2 0
My favourite function get_dummies
lets you do it but not as elegantly or efficiently (but I'll keep this original, if crazy, suggestion): 我最喜欢的函数get_dummies
可以使您做到这一点,但是却不那么优雅或有效(但如果有任何建议,我会保留原来的建议):
In [22]: pd.get_dummies(df.set_index('id')['colours']).reset_index().groupby('id').sum()
Out[22]:
blue green purple red yellow
id
21627 1 1 1 2 0
21628 0 1 0 1 1
21629 0 0 0 2 0
Here's another way; 这是另一种方式; Though @Andy's a bit more intuitve 虽然@Andy有点直觉
In [24]: df.groupby('id').apply(
lambda x: x['color'].value_counts()).unstack().fillna(0)
Out[24]:
blue green purple red yellow
id
21627 1 1 1 2 0
21628 0 1 0 1 1
21629 0 0 0 2 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.