简体   繁体   中英

How Count/Sum Values Based On Multiple Conditions in Multiple Columns

I have a shipping records table with approx. 100K rows and I want to calculate, for each row, for each material, how many qtys were shipped in last 30 days. As you can see in below example, calculated qty depends on "material, shipping date". I've tried to write very basic code and couldn't find a way to apply it to all rows.

df[(df['malzeme']==material) & (df['cikistarihi'] < shippingDate) & (df['cikistarihi'] >= (shippingDate-30))]['qty'].sum()
material shippingDate qty shipped qtys in last 30 days
A 23.01.2019 8 0
A 28.01.2019 41 8
A 31.01.2019 66 49 (8+41)
A 20.03.2019 67 0
B 17.02.2019 53 0
B 26.02.2019 35 53
B 11.03.2019 4 88 (53+35)
B 20.03.2019 67 106 (35+4+67)

You can use .groupby with .rolling :

# convert the shippingData to datetime:
df["shippingDate"] = pd.to_datetime(df["shippingDate"], dayfirst=True)

# sort the values (if they aren't already)
df = df.sort_values(["material", "shippingDate"])

df["shipped qtys in last 30 days"] = (
    df.groupby("material")
    .rolling("30D", on="shippingDate", closed="left")["qty"]
    .sum()
    .fillna(0)
    .values
)
print(df)

Prints:

  material shippingDate  qty  shipped qtys in last 30 days
0        A   2019-01-23    8                           0.0
1        A   2019-01-28   41                           8.0
2        A   2019-01-31   66                          49.0
3        A   2019-03-20   67                           0.0
4        B   2019-02-17   53                           0.0
5        B   2019-02-26   35                          53.0
6        B   2019-03-11    4                          88.0
7        B   2019-03-20   67                          39.0

EDIT: Add .sort_values() before groupby

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM