Pandas 使用来自 groupby 的计数创建新列

Question

I have a df that looks like the following:我有一个如下所示的 df：

id        item        color
01        truck       red
02        truck       red
03        car         black
04        truck       blue
05        car         black

I am trying to create a df that looks like this:我正在尝试创建一个如下所示的 df：

item      color       count
truck     red          2
truck     blue         1
car       black        2

I have tried我努力了

df["count"] = df.groupby("item")["color"].transform('count')

But it is not quite what I am searching for.但这不是我正在寻找的。

Any guidance is appreciated任何指导表示赞赏

Answer 1

That's not a new column, that's a new DataFrame:这不是一个新列，而是一个新的 DataFrame：

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index :要获得您想要的结果是使用reset_index ：

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:要获得“新列”，您可以使用转换：

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs .我建议阅读文档的split-apply-combine 部分。

Answer 2

Another possible way to achieve the desired output would be to use Named Aggregation .实现所需输出的另一种可能方法是使用Named Aggregation 。 Which will allow you to specify the name and respective aggregation function for the desired output columns.这将允许您为所需的输出列指定名称和相应的聚合函数。

Named aggregation命名聚合

( New in version 0.25.0. ) （ 0.25.0 版中的新功能。 ）

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg() , known as “named aggregation”, where:为了通过控制输出列名称来支持特定于列的聚合，pandas 接受GroupBy.agg()的特殊语法，称为“命名聚合”，其中：

The keywords are the output column names关键字是输出列名称

The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column.这些值是元组，其第一个元素是要选择的列，第二个元素是要应用于该列的聚合。 Pandas provides the pandas.NamedAgg named tuple with the fields ['column','aggfunc'] to make it clearer what the arguments are. Pandas 提供了带有字段['column','aggfunc']名为pandas.NamedAgg元组，以便更清楚地说明参数是什么。 As usual, the aggregation can be a callable or a string alias.像往常一样，聚合可以是可调用的或字符串别名。

So to get the desired output - you could try something like...因此，要获得所需的输出 - 您可以尝试类似...

import pandas as pd
# Setup
df = pd.DataFrame([
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"car",
        "color":"black"
    },
    {
        "item":"truck",
        "color":"blue"
    },
    {
        "item":"car",
        "color":"black"
    }
])

df_grouped = df.groupby(["item", "color"]).agg(
    count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)

Which produces the following output:产生以下输出：

             count_col
item  color
car   black          2
truck blue           1
      red            2

Answer 3

You can use value_counts and name the column with reset_index :您可以使用value_counts并将列命名为reset_index ：

In [3]: df[['item', 'color']].value_counts().reset_index(name='counts')
Out[3]: 
    item  color  counts
0    car  black       2
1  truck    red       2
2  truck   blue       1

Answer 4

Here is another option:这是另一种选择：

import numpy as np
df['Counts'] = np.zeros(len(df))
grp_df = df.groupby(['item', 'color']).count()

which results in这导致

             Counts
item  color        
car   black       2
truck blue        1
      red         2

Answer 5

An option that is more literal then the accepted answer.一个比接受的答案更直白的选项。

df.groupby(["item", "color"], as_index=False).agg(count=("item", "count"))

Any column name can be used in place of "item" in the aggregation.任何列名都可以用来代替聚合中的“item”。

"as_index=False" prevents the grouped column from becoming the index. “as_index=False”防止分组列成为索引。

Pandas 使用来自 groupby 的计数创建新列

问题描述

5 个解决方案

解决方案1
105 已采纳 2015-04-24 00:31:07

解决方案2
13 2020-02-11 00:06:10

Named aggregation命名聚合

解决方案3
6 2022-05-30 18:36:16

解决方案4
2 2020-04-30 19:20:35

解决方案5
0 2023-02-01 20:19:08

Pandas 使用来自 groupby 的计数创建新列

问题描述

5 个解决方案

解决方案1 105 已采纳 2015-04-24 00:31:07

解决方案2 13 2020-02-11 00:06:10

Named aggregation命名聚合

解决方案3 6 2022-05-30 18:36:16

解决方案4 2 2020-04-30 19:20:35

解决方案5 0 2023-02-01 20:19:08

解决方案1
105 已采纳 2015-04-24 00:31:07

解决方案2
13 2020-02-11 00:06:10

解决方案3
6 2022-05-30 18:36:16

解决方案4
2 2020-04-30 19:20:35

解决方案5
0 2023-02-01 20:19:08