简体   繁体   中英

Python Pandas GroupBy().Sum() Having Clause

So I have this DataFrame with 3 columns 'Order ID, 'Order Qty' and 'Fill Qty'

I want to sum the Fill Qty per order then compare it to Order Qty, Ideally I will return only a dataframe that gives me Order ID whenever aggregate Fill Qty is greater than Order Qty.

In SQL I think what I'm looking for is

SELECT * FROM DataFrame GROUP BY Order ID, Order Qty HAVING sum(Fill Qty)>Order Qty

So far I have this:

SumFills= DataFrame.groupby(['Order ID','Order Qty']).sum()

output:


....................................Fill Qty

Order ID - Order Qty -           
1--------- 300      ---------       300

2   ---------     80    -----------           40

3    ---------    20        -----------       20

4     ---------   110      ----------       220

5     ---------   100        ----------     200

6    ---------    100         ----------    200

Above is aggregated already, I would ideally like to return a list/array of [4,5,6] since those have sum(fill qty) > Order Qty

View original dataframe:

In [57]: print original_df
    Order Id  Fill Qty  Order Qty
0          1       419        334
1          2       392        152
2          3       167        469
3          4       470        359
4          5       447        441
5          6       154        190
6          7       365        432
7          8       209        181
8          9       140        136
9         10       112        358
10        11       384        302
11        12       307        376
12        13       119        237
13        14       147        342
14        15       279        197
15        16       280        137
16        17       148        381
17        18       313        498
18        19       193        328
19        20       291        193
20        21       100        357
21        22       161        286
22        23       453        168
23        24       349        283

Create and view new dataframe summing the Fill Qty:

In [58]: new_df = original_df.groupby(['Order Id','Order Qty'], as_index=False).sum()

In [59]: print new_df
    Order Id  Order Qty  Fill Qty
0          1        334       419
1          2        152       392
2          3        469       167
3          4        359       470
4          5        441       447
5          6        190       154
6          7        432       365
7          8        181       209
8          9        136       140
9         10        358       112
10        11        302       384
11        12        376       307
12        13        237       119
13        14        342       147
14        15        197       279
15        16        137       280
16        17        381       148
17        18        498       313
18        19        328       193
19        20        193       291
20        21        357       100
21        22        286       161
22        23        168       453
23        24        283       349

Slice new dataframe to only those rows where Fill Qty > Order Qty:

In [60]: new_df = new_df.loc[new_df['Fill Qty'] > new_df['Order Qty'],:]

In [61]: print new_df
    Order Id  Order Qty  Fill Qty
0          1        334       419
1          2        152       392
3          4        359       470
4          5        441       447
7          8        181       209
8          9        136       140
10        11        302       384
14        15        197       279
15        16        137       280
19        20        193       291
22        23        168       453
23        24        283       349

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM