I have a shipping records table with approx. 100K rows and I want to calculate, for each row, for each material, how many qtys were shipped in last 30 days. As you can see in below example, calculated qty depends on "material, shipping date". I've tried to write very basic code and couldn't find a way to apply it to all rows.
df[(df['malzeme']==material) & (df['cikistarihi'] < shippingDate) & (df['cikistarihi'] >= (shippingDate-30))]['qty'].sum()
material | shippingDate | qty | shipped qtys in last 30 days |
---|---|---|---|
A | 23.01.2019 | 8 | 0 |
A | 28.01.2019 | 41 | 8 |
A | 31.01.2019 | 66 | 49 (8+41) |
A | 20.03.2019 | 67 | 0 |
B | 17.02.2019 | 53 | 0 |
B | 26.02.2019 | 35 | 53 |
B | 11.03.2019 | 4 | 88 (53+35) |
B | 20.03.2019 | 67 | 106 (35+4+67) |
You can use .groupby
with .rolling
:
# convert the shippingData to datetime:
df["shippingDate"] = pd.to_datetime(df["shippingDate"], dayfirst=True)
# sort the values (if they aren't already)
df = df.sort_values(["material", "shippingDate"])
df["shipped qtys in last 30 days"] = (
df.groupby("material")
.rolling("30D", on="shippingDate", closed="left")["qty"]
.sum()
.fillna(0)
.values
)
print(df)
Prints:
material shippingDate qty shipped qtys in last 30 days
0 A 2019-01-23 8 0.0
1 A 2019-01-28 41 8.0
2 A 2019-01-31 66 49.0
3 A 2019-03-20 67 0.0
4 B 2019-02-17 53 0.0
5 B 2019-02-26 35 53.0
6 B 2019-03-11 4 88.0
7 B 2019-03-20 67 39.0
EDIT: Add .sort_values()
before groupby
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.