Pandas - 将列的总和计算为按周列

Question

I have a table like below containing values for multiple IDs:我有一个如下表，其中包含多个 ID 的值：

ID ID	value价值	date日期
1 1个	20 20	2022-01-01 12:20 2022-01-01 12:20
2 2个	25 25	2022-01-04 18:20 2022-01-04 18:20
1 1个	10 10	2022-01-04 11:20 2022-01-04 11:20
1 1个	150 150	2022-01-06 16:20 2022-01-06 16:20
2 2个	200 200	2022-01-08 13:20 2022-01-08 13:20
3 3个	40 40	2022-01-04 21:20 2022-01-04 21:20
1 1个	75 75	2022-01-09 08:20 2022-01-09 08:20

I would like to calculate week wise sum of values for all IDs:我想计算所有 ID 的周值总和：

The start date is given (for example, 01-01-2022).给出了开始日期（例如，01-01-2022）。
Weeks are calculated based on range:周数是根据范围计算的：
- every Saturday 00:00 to next Friday 23:59 (ie Week 1 is from 01-01-2022 00:00 to 07-01-2022 23:59)每周六00:00至下周五23:59（即第1周为01-01-2022 00:00至07-01-2022 23:59）

ID ID	Week 1 sum第 1 周总和	Week 2 sum第 2 周总和	Week 3 sum第 3 周总和	... ...
1 1个	180 180	75 75	-- --	-- --
2 2个	25 25	200 200	-- --	-- --
3 3个	40 40	-- --	-- --	-- --

Answer 1

There's a pandas function ( pd.Grouper ) that allows you to specify a groupby instruction.有一个 pandas function ( pd.Grouper ) 允许您指定 groupby 指令。 ¹ In this case, that specification is to "resample" date by a weekly frequency that starts on Fridays. ¹在这种情况下，该规范是按从星期五开始的每周频率“重新采样”日期。 ² Since you also need to group by ID as well, add it to the grouper. ²由于您还需要按ID进行分组，因此将其添加到 grouper 中。

# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# pivot the dataframe
df1 = (
    df.groupby(['ID', pd.Grouper(key='date', freq='W-FRI')])['value'].sum()
    .unstack(fill_value=0)
)
# rename columns
df1.columns = [f"Week {c} sum" for c in range(1, df1.shape[1]+1)]
df1 = df1.reset_index()

¹ What you actually need is a pivot_table result but groupby + unstack is equivalent to pivot_table and groupby + unstack is more convenient here. ¹您实际需要的是pivot_table结果，但groupby + unstack等效于pivot_table并且groupby + unstack在这里更方便。

² Because Jan 1, 2022 is a Saturday, you need to specify the anchor on Friday. ²因为2022年1月1日是星期六，所以需要指定锚点在星期五。

Answer 2

You can compute a week column.您可以计算一周的列。 In case you've data for same year, you can extract just week number, which is less likely in real-time scenarios.如果您有同一年的数据，您可以只提取周数，这在实时场景中不太可能。 In case you've data from multiple years, it might be wise to derive a combination of Year & week number.如果您有多年的数据，明智的做法是导出年份和周数的组合。

df['Year-Week'] = df['Date'].dt.strftime('%Y-%U')

In your case the dates 2022-01-01 & 2022-01-04 18:2 should be convert to 2022-01 as per the scenario you considered.在您的情况下，日期 2022-01-01 和 2022-01-04 18:2 应根据您考虑的情况转换为 2022-01。

To calculate your pivot table, you can use the pandas pivot_table.要计算您的 pivot 表，您可以使用 pandas pivot_table。 Example code:示例代码：

pd.pivot_table(df, values='value', index=['ID'], columns=['year_weeknumber'], aggfunc=np.sum)

Answer 3

Let's define a formatting helper.让我们定义一个格式化助手。

def fmt(row):
    return f"{row.year}-{row.week:02d}"  # We ignore row.day

Now it's easy.现在很容易了。

>>> df = pd.DataFrame([dict(id=1, value=20, date="2022-01-01 12:20"),
                       dict(id=2, value=25, date="2022-01-04 18:20")])
>>> df['date'] = pd.to_datetime(df.date)
>>> df['iso'] = df.date.dt.isocalendar().apply(fmt, axis='columns')
>>> df
   id  value                date      iso
0   1     20 2022-01-01 12:20:00  2021-52
1   2     25 2022-01-04 18:20:00  2022-01

Just groupby the ISO week.只需按 ISO 周分组即可。

Pandas - 将列的总和计算为按周列

问题描述

3 个解决方案

解决方案1
2 已采纳 2023-01-17 22:14:25

解决方案2
0 2023-01-17 21:56:08

解决方案3
0 2023-01-17 21:56:12

Pandas - 将列的总和计算为按周列

问题描述

3 个解决方案

解决方案1 2 已采纳 2023-01-17 22:14:25

解决方案2 0 2023-01-17 21:56:08

解决方案3 0 2023-01-17 21:56:12

解决方案1
2 已采纳 2023-01-17 22:14:25

解决方案2
0 2023-01-17 21:56:08

解决方案3
0 2023-01-17 21:56:12