简体   繁体   中英

Sum values between two dates based on row conditions

I've been trying to sum values of two dataframes based on dates and also on others rows conditions, but have no idea on how to begin it with Python.

What I need to do is similar to a SUMIF in Excel/Google Sheets, it'll use columns 'loja' and 'item' and also consider if the dates on df1 and df2 are >= than the open date or <= on than the invent date on df3.

Look at the small versions of my dataframe.

I have df1 that looks like this:

    loja     date         item  sale
0   9288    2022-12-01  97845   10
1   9288    2022-12-03  97845   5
2   9288    2022-12-13  97845   6
3   9925    2022-12-02  33515   534
4   9288    2022-12-24  33515   839
5   9925    2022-12-12  33515   534
6   9925    2022-12-10  33515   234

And df2 that looks like this:

    loja    date       item    buy
0   9288    2022-12-01  97845   269
1   9288    2022-12-03  97845   415
2   9288    2022-12-30  97845   324
3   9925    2022-12-02  33515   116
4   9288    2022-12-24  33515   270
5   9925    2022-12-02  33515   248
6   9925    2022-12-15  33515   354

And df3 that look like this:

    loja    item    open    invent
0   9288    97845   2022-12-01  2022-12-20
1   9925    33515   2022-12-02  2022-12-12
2   9288    33515   2022-12-01  2022-12-20

So, in the end, I'm expecting to have a dataframe like this:

    loja    item    open    invent  buy sale
0   9288    97845   2022-12-01  2022-12-20  684 21
1   9925    33515   2022-12-02  2022-12-12  364 1302
2   9288    33515   2022-12-01  2022-12-20  0   0

Edit: thoses are small versions of my dataframes!

You need to use SUMIFS formula with evaluating loja, item and 2 dates. This is output: 在此处输入图像描述

Where formula in F20 (dragged down to F21 and F22 ) is =SUMIFS($E$11:$E$17,$C$11:$C$17,">="&D20,$C$11:$C$17,"<="&E20,$B$11:$B$17,B20,$D$11:$D$17,C20)

And formula in G20 (dragged down to G21 and G22 ) is =SUMIFS($E$2:$E$8,$C$2:$C$8,">="&D20,$C$2:$C$8,"<="&E20,$B$2:$B$8,B20,$D$2:$D$8,C20)

Run a merge between df1 and df3 , df2 and df3 , and run a sum on the groupby:

buy_df = (df3
          .merge(df2, on = ['loja', 'item'])
            # loc might be faster
            # ensure numexpr is installed 
            # for more performance if using query
          .query("open<=date<=invent")
         )

sale_df = (df3
          .merge(df1, on = ['loja', 'item'])
          .query("open<=date<=invent")
         )

(pd
.concat([buy_df,sale_df])
.groupby([*df3.columns])
.sum(numeric_only=True)
.reindex(df3, fill_value=0)
)

                                    buy    sale
loja item  open       invent
9288 97845 2022-12-01 2022-12-20  684.0    21.0
9925 33515 2022-12-02 2022-12-12  364.0  1302.0
9288 33515 2022-12-01 2022-12-20    0.0     0.0

Another option is with conditional_join from pyjanitor - depending on the data size, this option might be faster:

# pip install pyjanitor
import pandas as pd
import janitor

sales = (df3
         .conditional_join(
             df1, 
             ('open', 'date', '<='), 
             ('invent', 'date', '>='), 
             ('loja', 'loja', '=='), 
             ('item', 'item', '=='), 
             how = 'left', 
             right_columns='sale')
         )

buy  =  (df3
         .conditional_join(
             df2, 
             ('open', 'date', '<='), 
             ('invent', 'date', '>='), 
             ('loja', 'loja', '=='), 
             ('item', 'item', '=='), 
             how = 'left', 
             right_columns='buy')
         )

pd.concat([buy, sales]).groupby([*df3.columns]).sum()

                                    buy    sale
loja item  open       invent
9288 33515 2022-12-01 2022-12-20    0.0     0.0
     97845 2022-12-01 2022-12-20  684.0    21.0
9925 33515 2022-12-02 2022-12-12  364.0  1302.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM