如何在 Python/Pandas 中按連續日期對條目進行分組

Question

我有一個名為hot_days的熊貓系列，如下所示：

0     1980-06-04
1     1981-08-05
2     1982-06-04
3     1982-06-05
4     1982-07-08
         ...    
294   2019-07-25
295   2019-08-24
296   2019-08-25
297   2019-08-26
298   2019-08-27

它是給定位置的溫度高於閾值的日期列表。 我想檢測並記錄熱浪何時發生，也就是溫度超過此閾值三天或更長時間。 我想得到一個包含熱浪開始日期及其長度的數據框。 通過應用：

new_series = (hot_days == hot_days.shift(2)+pd.Timedelta("2 days")) * (hot_days.groupby((hot_days == hot_days.shift(2)+pd.Timedelta("2 days")).cumsum()).cumcount()+1)

我得到這個系列：

熱浪期間的日期為1 ，不在熱浪中的日期為0 ，我認為這是朝着正確方向邁出的一步。 但是，由於我是熊貓的新手，我不太確定如何實現我的目標。 我知道我可以使用循環，但是我知道這是“非 Pythonic”，因為循環在 Python 中很慢，所以我寧願找到一個更優雅的解決方案（盡管數據集足夠小，循環將在合理的數量下工作時間）。

Answer 1

讓我們將s稱為初始系列。

識別熱浪天：

waves = s.eq(s.shift(1)+pd.DateOffset(days=1)) & s.eq(s.shift(2)+pd.DateOffset(days=2))

使用 wave 和 wave 組創建一個 DataFrame：

df = pd.concat({'date': s,
                'wave': waves,
                'group': waves.diff(1).ne(0).cumsum()
                }, axis=1)

列出波浪及其持續時間：

pd.DataFrame({gid: pd.Series({'start': g.iloc[0]['date'],
                              'end': g.iloc[-1]['date'],
                              'duration': len(g)})
              for gid, g in df[df['wave']].groupby('group')
              }).T

輸出：

       start        end duration
2 2019-08-26 2019-08-27        2

注意。 由於數據集不完整，我的結果略有不同

編輯：這是waves.diff(1).ne(0).cumsum()工作原理：

    bool   diff  diff_int  diff_not  diff_not_int  diff_not_cumsum
0   True    NaN       NaN      True             1                1
1  False   True      -1.0      True             1                2
2  False  False       0.0     False             0                2
3   True   True       1.0      True             1                3
4   True  False       0.0     False             0                3

Answer 2

我們可以使用移位來增加計數，以便您以后可以在循環中進行計數。

s = pd.Series([0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,1,0,0,1,0,0])
s * (s.groupby((y != s.shift()).cumsum()).cumcount() + 1)

要得到：

或者我們可以遍歷一個 group by 以獲得單獨的列表。

df = pd.DataFrame({"a":s})
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):
    print (i,end="")
    print (g)
    print (g.a.tolist())
    print("--")

要得到：

1   a
0  0
[0]
--
2   a
1  1
2  1
[1, 1]
--
3   a
3  0
4  0
5  0
[0, 0, 0]
--
4    a
6   1
7   1
8   1
9   1
10  1
[1, 1, 1, 1, 1]
--
5    a
11  0
12  0
[0, 0]
--
6    a
13  1
14  1
[1, 1]
--
7    a
15  0
16  0
17  0
18  0
[0, 0, 0, 0]
--
8    a
19  1
20  1
[1, 1]
--
9    a
21  0
22  0
[0, 0]
--
10    a
23  1
[1]
--
11    a
24  0
25  0
[0, 0]
--

如何在 Python/Pandas 中按連續日期對條目進行分組

問題描述

2 個解決方案

解決方案1
1 已采納 2021-07-12 16:56:47

解決方案2
0 2021-07-12 16:24:57

如何在 Python/Pandas 中按連續日期對條目進行分組

問題描述

2 個解決方案

解決方案1 1 已采納 2021-07-12 16:56:47

解決方案2 0 2021-07-12 16:24:57

解決方案1
1 已采納 2021-07-12 16:56:47

解決方案2
0 2021-07-12 16:24:57