Pandas groupby用于列中的多个值

Question

I have a data frame similar to the following 我有一个类似于以下内容的数据框

+----------------+-------+
| class          | year  |
+----------------+-------+
| ['A', 'B']     | 2001  |
| ['A']          | 2002  |
| ['B']          | 2001  |
| ['A', 'B', 'C']| 2003  |
| ['B', 'C']     | 2001  |
| ['C']          | 2003  |
+----------------+-------+

I want to create a data frame using this so that the resulting table shows the count of each category in class per yer. 我想使用它创建一个数据框，以使结果表显示每个类中每个类别的计数。

+-----+----+----+----+
|year | A  | B  | C  |
+-----+----+----+----+
|2001 | 1  | 3  | 1  |
|2002 | 1  | 0  | 0  |
|2003 | 1  | 1  | 2  |
+-----+----+----+----+

What's the easiest way to do this? 最简单的方法是什么？

Answer 1

Try unnesting 尝试取消嵌套

s=unnesting(df,['class'])

Then, we do crosstab 然后，我们进行crosstab

pd.crosstab(s['year'],s['class'])

Method from sklearn sklearn方法

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df['class']),columns=mlb.classes_, index=df.year).sum(level=0)
Out[293]: 
      A  B  C
year         
2001  2  2  1
2002  1  1  1
2003  0  1  1

Method of get_dummies get_dummies方法

df.set_index('year')['class'].apply(','.join).str.get_dummies(sep=',').sum(level=0)
Out[297]: 
      A  B  C
year         
2001  2  2  1
2002  1  1  1
2003  0  1  1

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

Pandas groupby用于列中的多个值

问题描述

1 个解决方案

解决方案1
5 已采纳 2019-04-16 01:30:00

Pandas groupby用于列中的多个值

问题描述

1 个解决方案

解决方案1 5 已采纳 2019-04-16 01:30:00

解决方案1
5 已采纳 2019-04-16 01:30:00