簡體   English   中英

如何在 pandas 中使用 groupby 從每日數據中重新采樣每周數據?

[英]How to resample weekly data from daily data with groupby in pandas?

我有以下數據集。 我需要為每個整個月獲取一周開始(星期一)和周末(星期日),結果列應該根據分組(國家和產品)獲取每周數據的總和

SAMPLE INPUT

all_dates     country      product      result
10/22/2021     A          Broadband       13
10/23/2021     A          Broadband       8
10/24/2021     A          Broadband       7
10/25/2021     A          Broadband       36
8/4/2021       C          TV              2
8/7/2021       C          TV              1

EXPECTED OUTPUT

week_start     week_end         product      country  result
10/4/2021      10/10/2021       Broadband     A        0
10/11/2021     10/17/2021       Broadband     A        0
10/18/2021     10/24/2021       Broadband     A        28
10/25/2021     10/31/2021       Broadband     A        36
8/2/2021       8/8/2021         TV            C        3
8/9/2021       8/15/2021        TV            C        0
8/16/2021      8/22/2021        TV            C        0
8/23/2021      8/29/2021        TV            C        0
8/30/2021      9/5/2021         TV            C        0

我嘗試了以下邏輯; 但我無法得到預期的結果

**first try**

df1 = (df.set_index('all_dates').groupby(['product','country'])['result'].resample('W-MON').sum().reset_index().rename(columns={'all_dates':'week_start'}))
df1.insert(3, 'week_enddate', df1['week_startdate'] +  pd.offsets.DateOffset(days=6))

**second try**

weekly = df.groupby(by=['product','country', pd.Grouper(key='all_dates', freq='W')])['result'].sum().reset_index()
weekly = weekly.rename({'all_dates': 'week_start'}, axis=1)
weekly['week_end'] = weekly['week_start'] + pd.offsets.Week(weekday=5)

**third try**

df['start'] = df['all_dates'] - pd.offsets.Week(weekday=6)
df['end'] = df['start'] + pd.offsets.Week(weekday=5)
df3 =df.groupby(['start','end','product','country'])['metric_result'].sum().reset_index()
df3

有沒有其他方法可以實現這一點。

因此,使用您的樣本輸入:

import pandas as pd

df = pd.DataFrame(
    {
        "all_dates": {
            0: "10/22/2021",
            1: "10/23/2021",
            2: "10/24/2021",  # sunday
            3: "10/25/2021",
            4: "8/4/2021",
            5: "8/7/2021",
        },
        "country": {0: "A", 1: "A", 2: "A", 3: "A", 4: "C", 5: "C"},
        "product": {
            0: "Broadband",
            1: "Broadband",
            2: "Broadband",
            3: "Broadband",
            4: "TV",
            5: "TV",
        },
        "result": {0: 13, 1: 8, 2: 7, 3: 36, 4: 2, 5: 1},
    }
)

你可以試試這個:

# Setup
df["all_dates"] = pd.to_datetime(df["all_dates"])
df["year"] = df["all_dates"].dt.isocalendar().year
df["week_num"] = df["all_dates"].dt.isocalendar().week

# Find aggregated values
agg_df = (
    df.groupby(by=["year", "week_num", "country", "product"])
    .sum()
    .sort_values(by=["country", "product"], ascending=True)
    .reset_index()
)

# Add aggregated values to sliced original dataframe
new_df = (
    pd.merge(
        left=df[["all_dates", "week_num"]], right=agg_df, on="week_num", how="inner"
    )
    .drop_duplicates(subset=["year", "week_num"])
    .drop(columns=["year", "week_num"])
    .reset_index(drop=True)
)

# Add first and last day of each week
new_df["week_start"] = new_df["all_dates"] - new_df[
    "all_dates"
].dt.weekday * pd.Timedelta(days=1)
new_df["week_end"] = new_df["all_dates"] + pd.offsets.Week(weekday=6)

# Cleanup
new_df = new_df[["week_start", "week_end", "product", "country", "result"]]

接着:

print(new_df)

# Output
  week_start   week_end    product country  result
0 2021-10-18 2021-10-24  Broadband       A      28
1 2021-10-25 2021-10-31  Broadband       A      36
2 2021-08-02 2021-08-08         TV       C       3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM