[英]Count groups of values in Pandas series
我有一個Pandas( pandas==0.23.4
)日期時間索引的數據value_id
df
,其列名為value_id
。
value_id
包含多組浮點值( 5.0
或6.0
)和NaN
組。 我想計算5.0
和6.0
的連續組數。 這些組必須包含至少三個連續的值。
例如:
In [1]: print df.value_id
timestamp
2019-01-06 17:42:08 NaN
2019-01-06 17:45:08 5.0
2019-01-06 17:48:08 5.0
2019-01-06 17:51:08 5.0
2019-01-06 17:54:08 NaN
2019-01-06 17:57:08 NaN
2019-01-06 18:00:08 NaN
2019-01-06 18:03:08 NaN
2019-01-06 18:06:08 NaN
2019-01-06 18:09:08 NaN
2019-01-06 18:12:08 6.0
2019-01-06 18:15:08 6.0
2019-01-06 19:54:09 NaN
2019-01-06 19:57:09 5.0
2019-01-06 20:00:08 5.0
2019-01-06 20:03:08 5.0
2019-01-06 20:06:09 NaN
2019-01-06 20:09:08 NaN
2019-01-06 20:12:08 NaN
2019-01-06 20:15:09 NaN
2019-01-06 20:18:08 NaN
2019-01-06 20:21:09 NaN
2019-01-06 20:24:09 NaN
2019-01-07 19:09:07 NaN
2019-01-07 19:12:06 NaN
2019-01-07 19:15:06 5.0
2019-01-07 19:18:06 5.0
2019-01-07 19:21:07 5.0
2019-01-07 19:24:07 5.0
2019-01-07 19:27:07 NaN
2019-01-07 19:30:07 NaN
2019-01-07 19:33:06 NaN
2019-01-07 19:36:07 NaN
2019-01-07 19:39:07 NaN
2019-01-07 19:42:06 NaN
2019-01-07 19:45:06 NaN
2019-01-07 19:48:06 NaN
2019-01-07 19:51:06 6.0
2019-01-07 19:54:07 6.0
2019-01-07 19:57:06 6.0
Name: value_id, dtype: float64
如果我有兩個名為count1
(用於5.0值組)和count2
(用於6.0值組)的變量,則為上述示例分配的結果計數為:
count1
:3
count2
:1
也許不是最優雅,但是您可以使用shift
來檢查接下來的兩個項目是否具有相同的值,並且先前的值不是同一組的一部分:
df['fives'] = ((df['timestamp'] == 5) & (df['timestamp'].shift(-1) == 5)
& (df['timestamp'].shift(-2) == 5)
& (df['timestamp'].shift(1) != 5)).astype(int)
df['sixes'] = ((df['timestamp'] == 6) & (df['timestamp'].shift(-1) == 6)
& (df['timestamp'].shift(-2) == 6)
& (df['timestamp'].shift(1) != 6)).astype(int)
df[['fives','sixes']].sum()
fives 3
sixes 1
dtype: int64
IIUC創建組密鑰cumsum
那么,我們只是做value_counts
s.groupby(s.isnull().cumsum()).value_counts().ge(3).sum(level=1)
Out[1026]:
timestamp
5.0 3.0
6.0 1.0
Name: timestamp, dtype: float64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.