繁体   English   中英

如何在 pandas 中使用 groupby 从每日数据中重新采样每周数据?

[英]How to resample weekly data from daily data with groupby in pandas?

我有以下数据集。 我需要为每个整个月获取一周开始(星期一)和周末(星期日),结果列应该根据分组(国家和产品)获取每周数据的总和

SAMPLE INPUT

all_dates     country      product      result
10/22/2021     A          Broadband       13
10/23/2021     A          Broadband       8
10/24/2021     A          Broadband       7
10/25/2021     A          Broadband       36
8/4/2021       C          TV              2
8/7/2021       C          TV              1

EXPECTED OUTPUT

week_start     week_end         product      country  result
10/4/2021      10/10/2021       Broadband     A        0
10/11/2021     10/17/2021       Broadband     A        0
10/18/2021     10/24/2021       Broadband     A        28
10/25/2021     10/31/2021       Broadband     A        36
8/2/2021       8/8/2021         TV            C        3
8/9/2021       8/15/2021        TV            C        0
8/16/2021      8/22/2021        TV            C        0
8/23/2021      8/29/2021        TV            C        0
8/30/2021      9/5/2021         TV            C        0

我尝试了以下逻辑; 但我无法得到预期的结果

**first try**

df1 = (df.set_index('all_dates').groupby(['product','country'])['result'].resample('W-MON').sum().reset_index().rename(columns={'all_dates':'week_start'}))
df1.insert(3, 'week_enddate', df1['week_startdate'] +  pd.offsets.DateOffset(days=6))

**second try**

weekly = df.groupby(by=['product','country', pd.Grouper(key='all_dates', freq='W')])['result'].sum().reset_index()
weekly = weekly.rename({'all_dates': 'week_start'}, axis=1)
weekly['week_end'] = weekly['week_start'] + pd.offsets.Week(weekday=5)

**third try**

df['start'] = df['all_dates'] - pd.offsets.Week(weekday=6)
df['end'] = df['start'] + pd.offsets.Week(weekday=5)
df3 =df.groupby(['start','end','product','country'])['metric_result'].sum().reset_index()
df3

有没有其他方法可以实现这一点。

因此,使用您的样本输入:

import pandas as pd

df = pd.DataFrame(
    {
        "all_dates": {
            0: "10/22/2021",
            1: "10/23/2021",
            2: "10/24/2021",  # sunday
            3: "10/25/2021",
            4: "8/4/2021",
            5: "8/7/2021",
        },
        "country": {0: "A", 1: "A", 2: "A", 3: "A", 4: "C", 5: "C"},
        "product": {
            0: "Broadband",
            1: "Broadband",
            2: "Broadband",
            3: "Broadband",
            4: "TV",
            5: "TV",
        },
        "result": {0: 13, 1: 8, 2: 7, 3: 36, 4: 2, 5: 1},
    }
)

你可以试试这个:

# Setup
df["all_dates"] = pd.to_datetime(df["all_dates"])
df["year"] = df["all_dates"].dt.isocalendar().year
df["week_num"] = df["all_dates"].dt.isocalendar().week

# Find aggregated values
agg_df = (
    df.groupby(by=["year", "week_num", "country", "product"])
    .sum()
    .sort_values(by=["country", "product"], ascending=True)
    .reset_index()
)

# Add aggregated values to sliced original dataframe
new_df = (
    pd.merge(
        left=df[["all_dates", "week_num"]], right=agg_df, on="week_num", how="inner"
    )
    .drop_duplicates(subset=["year", "week_num"])
    .drop(columns=["year", "week_num"])
    .reset_index(drop=True)
)

# Add first and last day of each week
new_df["week_start"] = new_df["all_dates"] - new_df[
    "all_dates"
].dt.weekday * pd.Timedelta(days=1)
new_df["week_end"] = new_df["all_dates"] + pd.offsets.Week(weekday=6)

# Cleanup
new_df = new_df[["week_start", "week_end", "product", "country", "result"]]

接着:

print(new_df)

# Output
  week_start   week_end    product country  result
0 2021-10-18 2021-10-24  Broadband       A      28
1 2021-10-25 2021-10-31  Broadband       A      36
2 2021-08-02 2021-08-08         TV       C       3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM