如何对熊猫数据框中的列表进行分组

Question

I have a dataframe which looks like this: 我有一个看起来像这样的数据框：

df = pd.DataFrame({'col1': [['a','b','c'], ['a','d'], ['c','c']]})

And I want to group the dataframe so it will look like this: 我想对数据框进行分组，使其看起来像这样：

result = pd.DataFrame({'col1': [['a'], ['b'], ['c'], ['d']], 'count': [[2],[1],[3],[4]]})

If I use the pd.groupby('col1').count() option in python I get the error 如果我在python中使用pd.groupby('col1').count()选项， pd.groupby('col1').count()收到错误消息

"Unhashable type: 'list'. “不可散列的类型：'列表'。

How to solve this? 如何解决呢？

Answer 1

You need flatten lists by DataFrame constructor, create Series by stack and last value_counts : 您需要按DataFrame构造函数展平列表，按stack和最后的value_counts创建Series ：

df1 = pd.DataFrame(df['col1'].values.tolist()).stack().value_counts().reset_index()
df1.columns = ['col1','count']
df1 = df1.sort_values('col1')
print (df1)
  col1  count
1    a      2
2    b      1
0    c      3
3    d      1

And if really want lists (some pandas function can failed) add applymap : 如果真的想要列表（某些熊猫函数可能会失败），则添加applymap ：

df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
1  [a]   [2]
2  [b]   [1]
0  [c]   [3]
3  [d]   [1]

Another solution with Counter + numpy.concatenate : Counter + numpy.concatenate另一种解决方案：

from collections import Counter

df1 = pd.Series(Counter(np.concatenate(df['col1']))).reset_index()
df1.columns = ['col1','count']
df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
0  [a]   [2]
1  [b]   [1]
2  [c]   [3]
3  [d]   [1]

如何对熊猫数据框中的列表进行分组

问题描述

1 个解决方案

解决方案1
2 2017-05-16 10:46:14

如何对熊猫数据框中的列表进行分组

问题描述

1 个解决方案

解决方案1 2 2017-05-16 10:46:14

解决方案1
2 2017-05-16 10:46:14