[英]How to resample weekly data from daily data with groupby in pandas?
我有以下數據集。 我需要為每個整個月獲取一周開始(星期一)和周末(星期日),結果列應該根據分組(國家和產品)獲取每周數據的總和
SAMPLE INPUT
all_dates country product result
10/22/2021 A Broadband 13
10/23/2021 A Broadband 8
10/24/2021 A Broadband 7
10/25/2021 A Broadband 36
8/4/2021 C TV 2
8/7/2021 C TV 1
EXPECTED OUTPUT
week_start week_end product country result
10/4/2021 10/10/2021 Broadband A 0
10/11/2021 10/17/2021 Broadband A 0
10/18/2021 10/24/2021 Broadband A 28
10/25/2021 10/31/2021 Broadband A 36
8/2/2021 8/8/2021 TV C 3
8/9/2021 8/15/2021 TV C 0
8/16/2021 8/22/2021 TV C 0
8/23/2021 8/29/2021 TV C 0
8/30/2021 9/5/2021 TV C 0
我嘗試了以下邏輯; 但我無法得到預期的結果
**first try**
df1 = (df.set_index('all_dates').groupby(['product','country'])['result'].resample('W-MON').sum().reset_index().rename(columns={'all_dates':'week_start'}))
df1.insert(3, 'week_enddate', df1['week_startdate'] + pd.offsets.DateOffset(days=6))
**second try**
weekly = df.groupby(by=['product','country', pd.Grouper(key='all_dates', freq='W')])['result'].sum().reset_index()
weekly = weekly.rename({'all_dates': 'week_start'}, axis=1)
weekly['week_end'] = weekly['week_start'] + pd.offsets.Week(weekday=5)
**third try**
df['start'] = df['all_dates'] - pd.offsets.Week(weekday=6)
df['end'] = df['start'] + pd.offsets.Week(weekday=5)
df3 =df.groupby(['start','end','product','country'])['metric_result'].sum().reset_index()
df3
有沒有其他方法可以實現這一點。
因此,使用您的樣本輸入:
import pandas as pd
df = pd.DataFrame(
{
"all_dates": {
0: "10/22/2021",
1: "10/23/2021",
2: "10/24/2021", # sunday
3: "10/25/2021",
4: "8/4/2021",
5: "8/7/2021",
},
"country": {0: "A", 1: "A", 2: "A", 3: "A", 4: "C", 5: "C"},
"product": {
0: "Broadband",
1: "Broadband",
2: "Broadband",
3: "Broadband",
4: "TV",
5: "TV",
},
"result": {0: 13, 1: 8, 2: 7, 3: 36, 4: 2, 5: 1},
}
)
你可以試試這個:
# Setup
df["all_dates"] = pd.to_datetime(df["all_dates"])
df["year"] = df["all_dates"].dt.isocalendar().year
df["week_num"] = df["all_dates"].dt.isocalendar().week
# Find aggregated values
agg_df = (
df.groupby(by=["year", "week_num", "country", "product"])
.sum()
.sort_values(by=["country", "product"], ascending=True)
.reset_index()
)
# Add aggregated values to sliced original dataframe
new_df = (
pd.merge(
left=df[["all_dates", "week_num"]], right=agg_df, on="week_num", how="inner"
)
.drop_duplicates(subset=["year", "week_num"])
.drop(columns=["year", "week_num"])
.reset_index(drop=True)
)
# Add first and last day of each week
new_df["week_start"] = new_df["all_dates"] - new_df[
"all_dates"
].dt.weekday * pd.Timedelta(days=1)
new_df["week_end"] = new_df["all_dates"] + pd.offsets.Week(weekday=6)
# Cleanup
new_df = new_df[["week_start", "week_end", "product", "country", "result"]]
接着:
print(new_df)
# Output
week_start week_end product country result
0 2021-10-18 2021-10-24 Broadband A 28
1 2021-10-25 2021-10-31 Broadband A 36
2 2021-08-02 2021-08-08 TV C 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.