[英]How to group over lists in pandas dataframe
I have a dataframe which looks like this: 我有一个看起来像这样的数据框:
df = pd.DataFrame({'col1': [['a','b','c'], ['a','d'], ['c','c']]})
And I want to group the dataframe so it will look like this: 我想对数据框进行分组,使其看起来像这样:
result = pd.DataFrame({'col1': [['a'], ['b'], ['c'], ['d']], 'count': [[2],[1],[3],[4]]})
If I use the pd.groupby('col1').count()
option in python I get the error 如果我在python中使用
pd.groupby('col1').count()
选项, pd.groupby('col1').count()
收到错误消息
"Unhashable type: 'list'.
“不可散列的类型:'列表'。
How to solve this? 如何解决呢?
You need flatten lists by DataFrame constructor, create Series
by stack
and last value_counts
: 您需要按DataFrame构造函数展平列表,按
stack
和最后的value_counts
创建Series
:
df1 = pd.DataFrame(df['col1'].values.tolist()).stack().value_counts().reset_index()
df1.columns = ['col1','count']
df1 = df1.sort_values('col1')
print (df1)
col1 count
1 a 2
2 b 1
0 c 3
3 d 1
And if really want lists (some pandas function can failed) add applymap
: 如果真的想要列表(某些熊猫函数可能会失败),则添加
applymap
:
df1 = df1.applymap(lambda x: [x])
print (df1)
col1 count
1 [a] [2]
2 [b] [1]
0 [c] [3]
3 [d] [1]
Another solution with Counter
+ numpy.concatenate
: Counter
+ numpy.concatenate
另一种解决方案:
from collections import Counter
df1 = pd.Series(Counter(np.concatenate(df['col1']))).reset_index()
df1.columns = ['col1','count']
df1 = df1.applymap(lambda x: [x])
print (df1)
col1 count
0 [a] [2]
1 [b] [1]
2 [c] [3]
3 [d] [1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.