简体   繁体   中英

How do I find the count of a particular column [Model], based on another column [SoldDate] using pandas?

I have a dataframe with 3 columns, such as SoldDate,Model and TotalSoldCount. How do I create a new column, 'CountSoldbyMonth' which will give the count of each of the many models sold monthly? A screenshot describing the problem is given. The 'CountSoldbyMonth' should always be less than the 'TotalSoldCount'.

I am new to Python. enter image description here

Date        Model  TotalSoldCount
Jan 19        A          4
Jan 19        A          4
Jan 19        A          4
Jan 19        B          6
Jan 19        C          2
Jan 19        C          2
Feb 19        A          4
Feb 19        B          6
Feb 19        B          6
Feb 19        B          6
Mar 19        B          6
Mar 19        B          6

The new df should look like this.

Date      Model     TotalSoldCount     CountSoldbyMonth
Jan 19     A               4                    3
Jan 19     A               4                    3
Jan 19     A               4                    3
Jan 19     B               6                    1
Jan 19     C               2                    2
Jan 19     C               2                    2
Feb 19     A               4                    1
Feb 19     B               6                    3
Feb 19     B               6                    3
Feb 19     B               6                    3
Mar 19     B               6                    2
Mar 19     B               6                    2

I tried doing

df['CountSoldbyMonth'] = df.groupby(['date','model']).totalsoldcount.transform('sum')

but it is generating a different value.

it's easier to help if you give code that let's the user experiment. In this case, I'd think taking your dataframe (df) & doing the following should work:

df['CountSoldbyMonth'] = df.groupby(['Date','Model'])['TotalSoldCount'].transform('sum')

Suppose you have this data set:

      date model  totalsoldcount
0   Jan 19     A             110
1   Jan 19     A             110
2   Jan 19     A             110
3   Jan 19     B              50
4   Jan 19     C              70
5   Jan 19     C              70
6   Feb 19     A             110
7   Feb 19     B              50
8   Feb 19     B              50
9   Feb 19     B              50
10  Mar 19     B              50
11  Mar 19     B              50

And you want to define a new column, countsoldbymonth . You can groupby the date and model columns and then sum the totalsoldcount with a transform and then create the new column:

s['countsoldbymonth'] = s.groupby([
    'date',
    'model'
]).totalsoldcount.transform('sum')

print(s)

      date model  totalsoldcount  countsoldbymonth
0   Jan 19     A             110               330
1   Jan 19     A             110               330
2   Jan 19     A             110               330
3   Jan 19     B              50                50
4   Jan 19     C              70               140
5   Jan 19     C              70               140
6   Feb 19     A             110               110
7   Feb 19     B              50               150
8   Feb 19     B              50               150
9   Feb 19     B              50               150
10  Mar 19     B              50               100
11  Mar 19     B              50               100

Or, if you just want to see the sums without creating a new column you can use sum instead of transform like this:

print(s.groupby([
    'date',
    'model'
]).totalsoldcount.sum())

date    model
Feb 19  A        110
        B        150
Jan 19  A        330
        B         50
        C        140
Mar 19  B        100

Edit

If you just want to know how many sales were done in the month you can do the same groupby , but instead of sum use count

df['CountSoldByMonth'] = df.groupby([
    'Date',
    'Model'
]).TotalSoldCount.transform('count')

print(df)

      Date Model  TotalSoldCount  CountSoldByMonth
0   Jan 19     A               4                 3
1   Jan 19     A               4                 3
2   Jan 19     A               4                 3
3   Jan 19     B               6                 1
4   Jan 19     C               2                 2
5   Jan 19     C               2                 2
6   Feb 19     A               4                 1
7   Feb 19     B               6                 3
8   Feb 19     B               6                 3
9   Feb 19     B               6                 3
10  Mar 19     B               6                 2
11  Mar 19     B               6                 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM