[英]How do I find the count of a particular column [Model], based on another column [SoldDate] using pandas?
I have a dataframe with 3 columns, such as SoldDate,Model and TotalSoldCount. 我有一个包含3列的数据框,例如SoldDate,Model和TotalSoldCount。 How do I create a new column, 'CountSoldbyMonth' which will give the count of each of the many models sold monthly? 如何创建一个新列'CountSoldbyMonth',它将列出每月销售的众多模型中的每一个? A screenshot describing the problem is given. 给出了描述该问题的屏幕截图。 The 'CountSoldbyMonth' should always be less than the 'TotalSoldCount'. 'CountSoldbyMonth'应始终小于'TotalSoldCount'。
I am new to Python. 我是Python的新手。 enter image description here 在此输入图像描述
Date Model TotalSoldCount
Jan 19 A 4
Jan 19 A 4
Jan 19 A 4
Jan 19 B 6
Jan 19 C 2
Jan 19 C 2
Feb 19 A 4
Feb 19 B 6
Feb 19 B 6
Feb 19 B 6
Mar 19 B 6
Mar 19 B 6
The new df should look like this. 新的df应该是这样的。
Date Model TotalSoldCount CountSoldbyMonth
Jan 19 A 4 3
Jan 19 A 4 3
Jan 19 A 4 3
Jan 19 B 6 1
Jan 19 C 2 2
Jan 19 C 2 2
Feb 19 A 4 1
Feb 19 B 6 3
Feb 19 B 6 3
Feb 19 B 6 3
Mar 19 B 6 2
Mar 19 B 6 2
I tried doing 我试过了
df['CountSoldbyMonth'] = df.groupby(['date','model']).totalsoldcount.transform('sum') df ['CountSoldbyMonth'] = df.groupby(['date','model'])。totalsoldcount.transform('sum')
but it is generating a different value. 但它产生了不同的价值。
it's easier to help if you give code that let's the user experiment. 如果您提供允许用户进行实验的代码,则更容易提供帮助。 In this case, I'd think taking your dataframe (df) & doing the following should work: 在这种情况下,我认为采用您的数据帧(df)并执行以下操作应该有效:
df['CountSoldbyMonth'] = df.groupby(['Date','Model'])['TotalSoldCount'].transform('sum')
Suppose you have this data set: 假设您有这个数据集:
date model totalsoldcount
0 Jan 19 A 110
1 Jan 19 A 110
2 Jan 19 A 110
3 Jan 19 B 50
4 Jan 19 C 70
5 Jan 19 C 70
6 Feb 19 A 110
7 Feb 19 B 50
8 Feb 19 B 50
9 Feb 19 B 50
10 Mar 19 B 50
11 Mar 19 B 50
And you want to define a new column, countsoldbymonth
. 并且您想要定义一个新列, countsoldbymonth
。 You can groupby
the date
and model
columns and then sum
the totalsoldcount
with a transform and then create the new column: 您可以groupby
的date
和model
列,然后sum
了totalsoldcount
与变换,然后创建新列:
s['countsoldbymonth'] = s.groupby([
'date',
'model'
]).totalsoldcount.transform('sum')
print(s)
date model totalsoldcount countsoldbymonth
0 Jan 19 A 110 330
1 Jan 19 A 110 330
2 Jan 19 A 110 330
3 Jan 19 B 50 50
4 Jan 19 C 70 140
5 Jan 19 C 70 140
6 Feb 19 A 110 110
7 Feb 19 B 50 150
8 Feb 19 B 50 150
9 Feb 19 B 50 150
10 Mar 19 B 50 100
11 Mar 19 B 50 100
Or, if you just want to see the sums without creating a new column you can use sum
instead of transform
like this: 或者,如果您只想在不创建新列的情况下查看总和,则可以使用sum
而不是像这样的transform
:
print(s.groupby([
'date',
'model'
]).totalsoldcount.sum())
date model
Feb 19 A 110
B 150
Jan 19 A 330
B 50
C 140
Mar 19 B 100
If you just want to know how many sales were done in the month you can do the same groupby
, but instead of sum
use count
如果您只是想知道当月完成了多少次销售,您可以使用相同的groupby
,而不是sum
使用count
df['CountSoldByMonth'] = df.groupby([
'Date',
'Model'
]).TotalSoldCount.transform('count')
print(df)
Date Model TotalSoldCount CountSoldByMonth
0 Jan 19 A 4 3
1 Jan 19 A 4 3
2 Jan 19 A 4 3
3 Jan 19 B 6 1
4 Jan 19 C 2 2
5 Jan 19 C 2 2
6 Feb 19 A 4 1
7 Feb 19 B 6 3
8 Feb 19 B 6 3
9 Feb 19 B 6 3
10 Mar 19 B 6 2
11 Mar 19 B 6 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.