Pandas 時間序列：計算年度數據中具有最小值的工作日

Question

這是我在 Stackoverflow 上的第一個問題，我希望我能足夠詳細地描述我的問題。 我開始使用 Pandas 學習數據分析，並且我創建了一個時間序列，其中包含某個站點的汽油價格的每日數據。 我已經將每小時數據分組為每日數據。

一年中，我已經通過簡單的散點圖 plot 和 plotly 取得了成功，但在下一步中，我想分析每周哪個工作日最便宜或最貴，計算日期名稱，然后查看是否存在模式整年。

            count      mean       std    min    25%    50%     75%    max  \
2022-01-01   35.0  1.685000  0.029124  1.649  1.659  1.689  1.6990  1.749   
2022-01-02   27.0  1.673444  0.024547  1.649  1.649  1.669  1.6890  1.729   
2022-01-03   28.0  1.664000  0.040597  1.599  1.639  1.654  1.6890  1.789   
2022-01-04   31.0  1.635129  0.045069  1.599  1.599  1.619  1.6490  1.779   
2022-01-05   33.0  1.658697  0.048637  1.599  1.619  1.649  1.6990  1.769   
2022-01-06   35.0  1.658429  0.050756  1.599  1.619  1.639  1.6940  1.779   
2022-01-07   30.0  1.637333  0.039136  1.599  1.609  1.629  1.6565  1.759   
2022-01-08   41.0  1.655829  0.041740  1.619  1.619  1.639  1.6790  1.769   
2022-01-09   35.0  1.647857  0.031602  1.619  1.619  1.639  1.6590  1.769   
2022-01-10   31.0  1.634806  0.041374  1.599  1.609  1.619  1.6490  1.769 
...  

            week    weekday  
2022-01-01    52   Saturday  
2022-01-02    52     Sunday  
2022-01-03     1     Monday  
2022-01-04     1    Tuesday  
2022-01-05     1  Wednesday  
2022-01-06     1   Thursday  
2022-01-07     1     Friday  
2022-01-08     1   Saturday  
2022-01-09     1     Sunday  
2022-01-10     2     Monday
...

我嘗試了分組和重采樣，但不幸的是我沒有得到我希望的結果。

有人可以建議如何處理這個問題嗎？ 謝謝！

Answer 1

這是一種做我認為你的問題的方法：

import pandas as pd
import numpy as np
df = pd.DataFrame({
'count':[35,27,28,31,33,35,30,41,35,31]*40,
'mean':
[1.685,1.673444,1.664,1.635129,1.658697,1.658429,1.637333,1.655829,1.647857,1.634806]*40
},
index=pd.Series(pd.to_datetime(pd.date_range("2022-01-01", periods=400, freq="D"))))
print( '','input df:',df,sep='\n' )

df_date = df.reset_index()['index']
df['weekday'] = list(df_date.dt.day_name())
df['year'] = df_date.dt.year.to_numpy()
df['week'] = df_date.dt.isocalendar().week.to_numpy()
df['year_week_started'] = df.year - np.where((df.week>=52)&(df.week.shift(-7)==1),1,0)
print( '','input df with intermediate columns:',df,sep='\n' )

cols = ['year_week_started', 'week']
dfCheap = df.loc[df.groupby(cols)['mean'].idxmin(),:].set_index(cols)
dfCheap = ( dfCheap.groupby(['year_week_started', 'weekday'])['mean'].count()
    .rename('freq').to_frame().set_index('freq', append=True)
    .reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfCheap:',dfCheap,sep='\n' )

dfExpensive = df.loc[df.groupby(cols)['mean'].idxmax(),:].set_index(cols)
dfExpensive = ( dfExpensive.groupby(['year_week_started', 'weekday'])['mean'].count()
    .rename('freq').to_frame().set_index('freq', append=True)
    .reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfExpensive:',dfExpensive,sep='\n' )

示例輸入：

input df:
            count      mean
2022-01-01     35  1.685000
2022-01-02     27  1.673444
2022-01-03     28  1.664000
2022-01-04     31  1.635129
2022-01-05     33  1.658697
...           ...       ...
2023-01-31     35  1.658429
2023-02-01     30  1.637333
2023-02-02     41  1.655829
2023-02-03     35  1.647857
2023-02-04     31  1.634806

[400 rows x 2 columns]

input df with intermediate columns:
            count      mean    weekday  year week  year_week_started
2022-01-01     35  1.685000   Saturday  2022   52               2021
2022-01-02     27  1.673444     Sunday  2022   52               2021
2022-01-03     28  1.664000     Monday  2022    1               2022
2022-01-04     31  1.635129    Tuesday  2022    1               2022
2022-01-05     33  1.658697  Wednesday  2022    1               2022
...           ...       ...        ...   ...  ...                ...
2023-01-31     35  1.658429    Tuesday  2023    5               2023
2023-02-01     30  1.637333  Wednesday  2023    5               2023
2023-02-02     41  1.655829   Thursday  2023    5               2023
2023-02-03     35  1.647857     Friday  2023    5               2023
2023-02-04     31  1.634806   Saturday  2023    5               2023

[400 rows x 6 columns]

樣本 output：

dfCheap:
                          weekday
year_week_started freq
2021              1        Monday
2022              11      Tuesday
                  10     Thursday
                  10    Wednesday
                  6        Sunday
                  5        Friday
                  5        Monday
                  5      Saturday
2023              2      Thursday
                  1      Saturday
                  1        Sunday
                  1     Wednesday

dfExpensive:
                          weekday
year_week_started freq
2021              1      Saturday
2022              16       Monday
                  10      Tuesday
                  6        Sunday
                  5        Friday
                  5      Saturday
                  5      Thursday
                  5     Wednesday
2023              2        Monday
                  1        Friday
                  1      Thursday
                  1       Tuesday

Pandas 時間序列：計算年度數據中具有最小值的工作日

問題描述

1 個解決方案

解決方案1
0 已采納 2023-01-21 21:57:15

Pandas 時間序列：計算年度數據中具有最小值的工作日

問題描述

1 個解決方案

解決方案1 0 已采納 2023-01-21 21:57:15

解決方案1
0 已采納 2023-01-21 21:57:15