如何使用pandas基于另一列[SoldDate]找到特定列[Model]的计数？

Question

I have a dataframe with 3 columns, such as SoldDate,Model and TotalSoldCount. 我有一个包含3列的数据框，例如SoldDate，Model和TotalSoldCount。 How do I create a new column, 'CountSoldbyMonth' which will give the count of each of the many models sold monthly? 如何创建一个新列'CountSoldbyMonth'，它将列出每月销售的众多模型中的每一个？ A screenshot describing the problem is given. 给出了描述该问题的屏幕截图。 The 'CountSoldbyMonth' should always be less than the 'TotalSoldCount'. 'CountSoldbyMonth'应始终小于'TotalSoldCount'。

I am new to Python. 我是Python的新手。 enter image description here 在此输入图像描述

Date        Model  TotalSoldCount
Jan 19        A          4
Jan 19        A          4
Jan 19        A          4
Jan 19        B          6
Jan 19        C          2
Jan 19        C          2
Feb 19        A          4
Feb 19        B          6
Feb 19        B          6
Feb 19        B          6
Mar 19        B          6
Mar 19        B          6

The new df should look like this. 新的df应该是这样的。

Date      Model     TotalSoldCount     CountSoldbyMonth
Jan 19     A               4                    3
Jan 19     A               4                    3
Jan 19     A               4                    3
Jan 19     B               6                    1
Jan 19     C               2                    2
Jan 19     C               2                    2
Feb 19     A               4                    1
Feb 19     B               6                    3
Feb 19     B               6                    3
Feb 19     B               6                    3
Mar 19     B               6                    2
Mar 19     B               6                    2

I tried doing 我试过了

df['CountSoldbyMonth'] = df.groupby(['date','model']).totalsoldcount.transform('sum') df ['CountSoldbyMonth'] = df.groupby（['date'，'model']）。totalsoldcount.transform（'sum'）

but it is generating a different value. 但它产生了不同的价值。

Answer 1

it's easier to help if you give code that let's the user experiment. 如果您提供允许用户进行实验的代码，则更容易提供帮助。 In this case, I'd think taking your dataframe (df) & doing the following should work: 在这种情况下，我认为采用您的数据帧（df）并执行以下操作应该有效：

df['CountSoldbyMonth'] = df.groupby(['Date','Model'])['TotalSoldCount'].transform('sum')

Answer 2

Suppose you have this data set: 假设您有这个数据集：

      date model  totalsoldcount
0   Jan 19     A             110
1   Jan 19     A             110
2   Jan 19     A             110
3   Jan 19     B              50
4   Jan 19     C              70
5   Jan 19     C              70
6   Feb 19     A             110
7   Feb 19     B              50
8   Feb 19     B              50
9   Feb 19     B              50
10  Mar 19     B              50
11  Mar 19     B              50

And you want to define a new column, countsoldbymonth . 并且您想要定义一个新列， countsoldbymonth 。 You can groupby the date and model columns and then sum the totalsoldcount with a transform and then create the new column: 您可以groupby的date和model列，然后sum了totalsoldcount与变换，然后创建新列：

s['countsoldbymonth'] = s.groupby([
    'date',
    'model'
]).totalsoldcount.transform('sum')

print(s)

      date model  totalsoldcount  countsoldbymonth
0   Jan 19     A             110               330
1   Jan 19     A             110               330
2   Jan 19     A             110               330
3   Jan 19     B              50                50
4   Jan 19     C              70               140
5   Jan 19     C              70               140
6   Feb 19     A             110               110
7   Feb 19     B              50               150
8   Feb 19     B              50               150
9   Feb 19     B              50               150
10  Mar 19     B              50               100
11  Mar 19     B              50               100

Or, if you just want to see the sums without creating a new column you can use sum instead of transform like this: 或者，如果您只想在不创建新列的情况下查看总和，则可以使用sum而不是像这样的transform ：

print(s.groupby([
    'date',
    'model'
]).totalsoldcount.sum())

date    model
Feb 19  A        110
        B        150
Jan 19  A        330
        B         50
        C        140
Mar 19  B        100

Edit 编辑

If you just want to know how many sales were done in the month you can do the same groupby , but instead of sum use count 如果您只是想知道当月完成了多少次销售，您可以使用相同的groupby ，而不是sum使用count

df['CountSoldByMonth'] = df.groupby([
    'Date',
    'Model'
]).TotalSoldCount.transform('count')

print(df)

      Date Model  TotalSoldCount  CountSoldByMonth
0   Jan 19     A               4                 3
1   Jan 19     A               4                 3
2   Jan 19     A               4                 3
3   Jan 19     B               6                 1
4   Jan 19     C               2                 2
5   Jan 19     C               2                 2
6   Feb 19     A               4                 1
7   Feb 19     B               6                 3
8   Feb 19     B               6                 3
9   Feb 19     B               6                 3
10  Mar 19     B               6                 2
11  Mar 19     B               6                 2

如何使用pandas基于另一列[SoldDate]找到特定列[Model]的计数？

问题描述

2 个解决方案

解决方案1
0 2019-06-24 21:15:38

解决方案2
0 已采纳 2019-06-24 21:17:43

Edit 编辑

如何使用pandas基于另一列[SoldDate]找到特定列[Model]的计数？

问题描述

2 个解决方案

解决方案1 0 2019-06-24 21:15:38

解决方案2 0 已采纳 2019-06-24 21:17:43

Edit 编辑

解决方案1
0 2019-06-24 21:15:38

解决方案2
0 已采纳 2019-06-24 21:17:43