[英]Pandas Time Series: Count weekdays with min value in annual data
這是我在 Stackoverflow 上的第一個問題,我希望我能足夠詳細地描述我的問題。 我開始使用 Pandas 學習數據分析,並且我創建了一個時間序列,其中包含某個站點的汽油價格的每日數據。 我已經將每小時數據分組為每日數據。
一年中,我已經通過簡單的散點圖 plot 和 plotly 取得了成功,但在下一步中,我想分析每周哪個工作日最便宜或最貴,計算日期名稱,然后查看是否存在模式整年。
count mean std min 25% 50% 75% max \
2022-01-01 35.0 1.685000 0.029124 1.649 1.659 1.689 1.6990 1.749
2022-01-02 27.0 1.673444 0.024547 1.649 1.649 1.669 1.6890 1.729
2022-01-03 28.0 1.664000 0.040597 1.599 1.639 1.654 1.6890 1.789
2022-01-04 31.0 1.635129 0.045069 1.599 1.599 1.619 1.6490 1.779
2022-01-05 33.0 1.658697 0.048637 1.599 1.619 1.649 1.6990 1.769
2022-01-06 35.0 1.658429 0.050756 1.599 1.619 1.639 1.6940 1.779
2022-01-07 30.0 1.637333 0.039136 1.599 1.609 1.629 1.6565 1.759
2022-01-08 41.0 1.655829 0.041740 1.619 1.619 1.639 1.6790 1.769
2022-01-09 35.0 1.647857 0.031602 1.619 1.619 1.639 1.6590 1.769
2022-01-10 31.0 1.634806 0.041374 1.599 1.609 1.619 1.6490 1.769
...
week weekday
2022-01-01 52 Saturday
2022-01-02 52 Sunday
2022-01-03 1 Monday
2022-01-04 1 Tuesday
2022-01-05 1 Wednesday
2022-01-06 1 Thursday
2022-01-07 1 Friday
2022-01-08 1 Saturday
2022-01-09 1 Sunday
2022-01-10 2 Monday
...
我嘗試了分組和重采樣,但不幸的是我沒有得到我希望的結果。
有人可以建議如何處理這個問題嗎? 謝謝!
這是一種做我認為你的問題的方法:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'count':[35,27,28,31,33,35,30,41,35,31]*40,
'mean':
[1.685,1.673444,1.664,1.635129,1.658697,1.658429,1.637333,1.655829,1.647857,1.634806]*40
},
index=pd.Series(pd.to_datetime(pd.date_range("2022-01-01", periods=400, freq="D"))))
print( '','input df:',df,sep='\n' )
df_date = df.reset_index()['index']
df['weekday'] = list(df_date.dt.day_name())
df['year'] = df_date.dt.year.to_numpy()
df['week'] = df_date.dt.isocalendar().week.to_numpy()
df['year_week_started'] = df.year - np.where((df.week>=52)&(df.week.shift(-7)==1),1,0)
print( '','input df with intermediate columns:',df,sep='\n' )
cols = ['year_week_started', 'week']
dfCheap = df.loc[df.groupby(cols)['mean'].idxmin(),:].set_index(cols)
dfCheap = ( dfCheap.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfCheap:',dfCheap,sep='\n' )
dfExpensive = df.loc[df.groupby(cols)['mean'].idxmax(),:].set_index(cols)
dfExpensive = ( dfExpensive.groupby(['year_week_started', 'weekday'])['mean'].count()
.rename('freq').to_frame().set_index('freq', append=True)
.reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfExpensive:',dfExpensive,sep='\n' )
示例輸入:
input df:
count mean
2022-01-01 35 1.685000
2022-01-02 27 1.673444
2022-01-03 28 1.664000
2022-01-04 31 1.635129
2022-01-05 33 1.658697
... ... ...
2023-01-31 35 1.658429
2023-02-01 30 1.637333
2023-02-02 41 1.655829
2023-02-03 35 1.647857
2023-02-04 31 1.634806
[400 rows x 2 columns]
input df with intermediate columns:
count mean weekday year week year_week_started
2022-01-01 35 1.685000 Saturday 2022 52 2021
2022-01-02 27 1.673444 Sunday 2022 52 2021
2022-01-03 28 1.664000 Monday 2022 1 2022
2022-01-04 31 1.635129 Tuesday 2022 1 2022
2022-01-05 33 1.658697 Wednesday 2022 1 2022
... ... ... ... ... ... ...
2023-01-31 35 1.658429 Tuesday 2023 5 2023
2023-02-01 30 1.637333 Wednesday 2023 5 2023
2023-02-02 41 1.655829 Thursday 2023 5 2023
2023-02-03 35 1.647857 Friday 2023 5 2023
2023-02-04 31 1.634806 Saturday 2023 5 2023
[400 rows x 6 columns]
樣本 output:
dfCheap:
weekday
year_week_started freq
2021 1 Monday
2022 11 Tuesday
10 Thursday
10 Wednesday
6 Sunday
5 Friday
5 Monday
5 Saturday
2023 2 Thursday
1 Saturday
1 Sunday
1 Wednesday
dfExpensive:
weekday
year_week_started freq
2021 1 Saturday
2022 16 Monday
10 Tuesday
6 Sunday
5 Friday
5 Saturday
5 Thursday
5 Wednesday
2023 2 Monday
1 Friday
1 Thursday
1 Tuesday
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.