如何按連續季節計算 pandas 日期時間月

Question

我有一個大的時間序列 dataframe。 該列已被格式化為日期時間。 如

2017-10-06T00:00:00+00:00
2020-04-29 00:00:00+00:00

我想 plot 每個季節的樣本編號。 比如下面的。 這些值是該季節的樣本計數。

1997 Winter 4
1997 Spring 8
1997 Summer 8
...
2020 Winter 32

我確實做了一些搜索，並意識到我可以創建一個字典來將月份轉換為季節。 然而，自“真正的冬季”以來最棘手的部分包含兩年的數據。 例如，1997 年冬季實際上應該包含 1997 年 12 月、1998 年 1 月和 1998 年 2 月。

請注意，我希望將“1997 年一月，1997 年二月”排除在 1997 年冬季之外，因為它們是“1996 年冬季”。

我想知道最有效的方法是什么？ 它不必命名為'1997 Winter'，只要計數從頭到尾連續，任何索引都應該對我有用。

非常感謝！

Answer 1

有一種快速的方法來解決它，但它不是很正統......您創建一個列“季節”，並使用 np.where() 分配季節。 一開始，你說前 3 個月是冬天，spring，接下來是 3 個月，依此類推。 然后，在列上應用 shift(-1) 以將其向后移動一行。 然后，你已經有了你的季節（只需填寫 las nan）。 然后，您可以以一種懶惰的方式解決您的問題。 如果您對代碼不滿意，請告訴我，我將對其進行編輯。

編輯：

我假設日期在索引中。 如果沒有，您應該應用 dt.month 而不是.month。 我分解它以使其清楚

_condtion_spring = (df.index.month>=4)&(df.index.month<=6)
_condition_summer = (df.index.month>7)&(df.index.month<=9)
_condition_automn = (df.index.month>=10)@(df.index.month<=12)
df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))
df['Season'] = df['Season'].shift(-1).fillna(method='ffill')

編輯2：

這里有一個完整的例子：

dates = pd.date_range("1983-09-01","1985-12-31",freq="1M")
df = pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates,columns=["Sample"])
df = df.sort_index()
_condition_winter = (df.index.month>=1)&(df.index.month<=3)
_condtion_spring = (df.index.month>=4)&(df.index.month<=6)
_condition_summer = (df.index.month>=7)&(df.index.month<=9)
_condition_automn = (df.index.month>=10)@(df.index.month<=12)
df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))
df['Season'] = df['Season']+'_'+df.index.strftime(date_format='%Y')
df['Season'] = df['Season'].shift(-1).fillna(method='ffill')
print('Sample for winter 1984 = ',df[df.Season=='Winter_1984'].Sample.sum())

編輯 3：

如果您在同一個月有幾行，這里是完整的示例：

#### Build our df
#### This is just to make it clear that we will have 2 rows of each month. It could be more or less.
dates = pd.date_range("1983-09-01","1985-12-31",freq="1M")
dates2 = pd.date_range("1983-09-01","1985-12-31",freq="1M")
df1 = pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates,columns=["Sample"]).append(pd.DataFrame(np.random.randint(100, 200,size=28)/100,index =dates2,columns=["Sample"]))
df1 = df1.sort_index()
#### Now, to keep it clear, even if we could do faster, let's do a dataframe with 1 row per month with total of sample each time
df = pd.DataFrame()
df = df1.groupby(df1.index).sum()
#### Let's sort by date to be sure that it won't me messy
#### If you've got a 'Date' column and not the index, apply a .sort_values('Date') instead of sort_index
df = df.sort_index()
#### If youve got a 'Date' column, it will be df.Date.dt.month istead of df.index.month
_condition_winter = (df.index.month>=1)&(df.index.month<=3)
_condtion_spring = (df.index.month>=4)&(df.index.month<=6)
_condition_summer = (df.index.month>=7)&(df.index.month<=9)
_condition_automn = (df.index.month>=10)@(df.index.month<=12)
df['Season'] = np.where(_condition_winter,'Winter',np.where(_condtion_spring,'Spring',np.where(_condition_summer,'Summer',np.where(_condition_automn,'Automn',np.nan))))
df['Season'] = df['Season']+'_'+df.index.strftime(date_format='%Y')
df['Season'] = df['Season'].shift(-1).fillna(method='ffill')
print('Sample for winter 1984 = ',df[df.Season=='Winter_1984'].Sample.sum())

Answer 2

我認為您應該創建一個 lambda function ，它根據月份和日期的值選擇正確的季節。

def seasons(date):
    m = date.month
    d = date.day
    season=None
    if (3==m and d>=21) or m==4 or m==5 or (m==6 and 20<=d):
        season = 'spring'
    elif (6==m and d>=21 ) or m==7 or m==8 or (m==9 and 20<=d):
        season = 'sommer'
    elif (9==m and d>=21 ) or m==10 or m==11 or (m==12 and 20<=d):
        season = 'autumn'
    elif (12==m and d>=21 ) or m==1 or m==2 or (m==3 and 20<=d):
        season = 'winter'
    return season

df['season'] = df.apply(lambda x: seasons(x['date']), axis=1)

請注意，季節也是按天選擇的。 因為冬季定義為從 12 月 21 日到 3 月 20 日，依此類推。

Answer 3

我找到了另一種解決方法。 所以我想把它留在這里。

1個月后轉移所有樣本
按月附加季節
然后，您可以以任何您想要的方式處理樣本。 例如

如果您對其進行編碼，則可能如下所示：

from dateutil.relativedelta import *
    
df.loc[:, 'shift_time'] = df.apply(lambda x: x['real_datetime'] + relativedelta(months=+1), axis=1)
df.loc[:, 'season'] = df['shift_time'].dt.quarter
grouped = df.groupby([(df['shift_time'].dt.year), (df['season'])]).count()

如何按連續季節計算 pandas 日期時間月

問題描述

3 個解決方案

解決方案1
1 已采納 2021-01-30 19:52:06

解決方案2
1 2021-01-30 20:18:12

解決方案3
0 2021-01-30 20:30:07

如何按連續季節計算 pandas 日期時間月

問題描述

3 個解決方案

解決方案1 1 已采納 2021-01-30 19:52:06

解決方案2 1 2021-01-30 20:18:12

解決方案3 0 2021-01-30 20:30:07

解決方案1
1 已采納 2021-01-30 19:52:06

解決方案2
1 2021-01-30 20:18:12

解決方案3
0 2021-01-30 20:30:07