简体   繁体   English

如何对熊猫数据框中的列表进行分组

[英]How to group over lists in pandas dataframe

I have a dataframe which looks like this: 我有一个看起来像这样的数据框:

df = pd.DataFrame({'col1': [['a','b','c'], ['a','d'], ['c','c']]})

And I want to group the dataframe so it will look like this: 我想对数据框进行分组,使其看起来像这样:

result = pd.DataFrame({'col1': [['a'], ['b'], ['c'], ['d']], 'count': [[2],[1],[3],[4]]})

If I use the pd.groupby('col1').count() option in python I get the error 如果我在python中使用pd.groupby('col1').count()选项, pd.groupby('col1').count()收到错误消息

"Unhashable type: 'list'. “不可散列的类型:'列表'。

How to solve this? 如何解决呢?

You need flatten lists by DataFrame constructor, create Series by stack and last value_counts : 您需要按DataFrame构造函数展平列表,按stack和最后的value_counts创建Series

df1 = pd.DataFrame(df['col1'].values.tolist()).stack().value_counts().reset_index()
df1.columns = ['col1','count']
df1 = df1.sort_values('col1')
print (df1)
  col1  count
1    a      2
2    b      1
0    c      3
3    d      1

And if really want lists (some pandas function can failed) add applymap : 如果真的想要列表(某些熊猫函数可能会失败),则添加applymap

df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
1  [a]   [2]
2  [b]   [1]
0  [c]   [3]
3  [d]   [1]

Another solution with Counter + numpy.concatenate : Counter + numpy.concatenate另一种解决方案:

from collections import Counter

df1 = pd.Series(Counter(np.concatenate(df['col1']))).reset_index()
df1.columns = ['col1','count']
df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
0  [a]   [2]
1  [b]   [1]
2  [c]   [3]
3  [d]   [1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM