簡體   English   中英

當連續值的數量低於某個閾值時,查找數據幀內連續值的索引

[英]Find index of consecutive values within a dataframe when number of consecutive values is below a certain threshold

我有一個如下所示的數據框:

                     night  DSWRF_integ
ForecastTime
2018-05-12 00:00:00    1.0            1
2018-05-12 00:15:00    0.0            1
2018-05-12 00:30:00    0.0            1
2018-05-12 00:45:00    0.0            1
2018-05-12 01:00:00    0.0            0
2018-05-12 01:15:00    0.0            0
2018-05-12 01:30:00    0.0            0
2018-05-12 01:45:00    0.0            0
2018-05-12 02:00:00    0.0            0
2018-05-12 02:15:00    0.0            0
2018-05-12 02:30:00    0.0            0
2018-05-12 02:45:00    0.0            0
2018-05-12 03:00:00    0.0            0
2018-05-12 03:15:00    0.0            0
2018-05-12 03:30:00    0.0            0
2018-05-12 03:45:00    0.0            0
2018-05-12 04:00:00    0.0            0
2018-05-12 04:15:00    0.0            0
2018-05-12 04:30:00    0.0            0
2018-05-12 04:45:00    0.0            0
2018-05-12 05:00:00    0.0            0
2018-05-12 05:15:00    0.0            0
2018-05-12 05:30:00    0.0            0
2018-05-12 05:45:00    0.0            0
2018-05-12 06:00:00    0.0            0
2018-05-12 06:15:00    0.0            0
2018-05-12 06:30:00    0.0            0
2018-05-12 06:45:00    0.0            0
2018-05-12 07:00:00    0.0            0
2018-05-12 07:15:00    0.0            0
2018-05-12 07:30:00    0.0            0
2018-05-12 07:45:00    0.0            0
2018-05-12 08:00:00    0.0            0
2018-05-12 08:15:00    0.0            0
2018-05-12 08:30:00    0.0            0
2018-05-12 08:45:00    0.0            0
2018-05-12 09:00:00    0.0            0
2018-05-12 09:15:00    0.0            0
2018-05-12 09:30:00    0.0            0
2018-05-12 09:45:00    0.0            0
2018-05-12 10:00:00    0.0            0
2018-05-12 10:15:00    0.0            0
2018-05-12 10:30:00    0.0            0
2018-05-12 10:45:00    0.0            0
2018-05-12 11:00:00    0.0            0
2018-05-12 11:15:00    0.0            1
2018-05-12 11:30:00    0.0            1
2018-05-12 11:45:00    0.0            1

2018-05-12 12:00:00    0.0            0
2018-05-12 12:15:00    0.0            0
2018-05-12 12:30:00    0.0            0
2018-05-12 12:45:00    0.0            0
2018-05-12 13:00:00    0.0            0
2018-05-12 13:15:00    0.0            0
2018-05-12 13:30:00    0.0            0
2018-05-12 13:45:00    0.0            0

2018-05-12 14:00:00    1.0            1
2018-05-12 14:15:00    1.0            1
2018-05-12 14:30:00    1.0            1
2018-05-12 14:45:00    1.0            1
2018-05-12 15:00:00    1.0            1

我試圖找出一個邏輯,而不是迭代數據幀,因為它太慢,能夠將列DSWRF_integ中的連續零轉換為1, 只有當連續零的數量小於特定閾值時(例如閾值= 10)。

在這個特定的情況下,我想將列DSWRF_integ中的所有零替換為1,時間段為2018-05-12 12:00:002018-05-12 13:45:00 ,因為數量為連續零小於10。

生成的數據框應如下所示:

                     night  DSWRF_integ
ForecastTime
2018-05-12 00:00:00    1.0            1
2018-05-12 00:15:00    0.0            1
2018-05-12 00:30:00    0.0            1
2018-05-12 00:45:00    0.0            1
2018-05-12 01:00:00    0.0            0
2018-05-12 01:15:00    0.0            0
2018-05-12 01:30:00    0.0            0
2018-05-12 01:45:00    0.0            0
2018-05-12 02:00:00    0.0            0
2018-05-12 02:15:00    0.0            0
2018-05-12 02:30:00    0.0            0
2018-05-12 02:45:00    0.0            0
2018-05-12 03:00:00    0.0            0
2018-05-12 03:15:00    0.0            0
2018-05-12 03:30:00    0.0            0
2018-05-12 03:45:00    0.0            0
2018-05-12 04:00:00    0.0            0
2018-05-12 04:15:00    0.0            0
2018-05-12 04:30:00    0.0            0
2018-05-12 04:45:00    0.0            0
2018-05-12 05:00:00    0.0            0
2018-05-12 05:15:00    0.0            0
2018-05-12 05:30:00    0.0            0
2018-05-12 05:45:00    0.0            0
2018-05-12 06:00:00    0.0            0
2018-05-12 06:15:00    0.0            0
2018-05-12 06:30:00    0.0            0
2018-05-12 06:45:00    0.0            0
2018-05-12 07:00:00    0.0            0
2018-05-12 07:15:00    0.0            0
2018-05-12 07:30:00    0.0            0
2018-05-12 07:45:00    0.0            0
2018-05-12 08:00:00    0.0            0
2018-05-12 08:15:00    0.0            0
2018-05-12 08:30:00    0.0            0
2018-05-12 08:45:00    0.0            0
2018-05-12 09:00:00    0.0            0
2018-05-12 09:15:00    0.0            0
2018-05-12 09:30:00    0.0            0
2018-05-12 09:45:00    0.0            0
2018-05-12 10:00:00    0.0            0
2018-05-12 10:15:00    0.0            0
2018-05-12 10:30:00    0.0            0
2018-05-12 10:45:00    0.0            0
2018-05-12 11:00:00    0.0            0
2018-05-12 11:15:00    0.0            1
2018-05-12 11:30:00    0.0            1
2018-05-12 11:45:00    0.0            1

2018-05-12 12:00:00    0.0            1
2018-05-12 12:15:00    0.0            1
2018-05-12 12:30:00    0.0            1
2018-05-12 12:45:00    0.0            1
2018-05-12 13:00:00    0.0            1
2018-05-12 13:15:00    0.0            1
2018-05-12 13:30:00    0.0            1
2018-05-12 13:45:00    0.0            1

2018-05-12 14:00:00    1.0            1
2018-05-12 14:15:00    1.0            1
2018-05-12 14:30:00    1.0            1
2018-05-12 14:45:00    1.0            1
2018-05-12 15:00:00    1.0            1

我嘗試了各種方法,使用輔助列,但沒有一個產生任何接近我想要的東西。 任何幫助將非常感謝:)

您可以執行以下操作:

th = 3 # set threshold

# Sets to True rows that are 0
x = df.DSWRF_integ.eq(0)

# Takes the cumulative sum of rows where changes occur (thus where diff != 0)
g = x.astype(int).diff().fillna(0).ne(0).cumsum()

# Groups the original df with g and replaces 0 to 1 where the length of consecutive zeroes
# is smaller than the threshold
ix = x[x].groupby(g[x]).transform('size').lt(th) = 1
df.loc[ix[ix].index, 'DSWRF_integ'] = 1

我創建了這個示例數據框,以便更輕松地檢查結果數據幀。 我還創建了一個最終的數據pd.Series添加了所有中間pd.Series ,以便更好地理解所有步驟:

df = pd.DataFrame({'col1':[0,0,0,2,1,3,0,1,2,0,0,0,0,1]})

現在,設置例如閾值4,應該變為1全零,除了第9行到第12行中的零:

result = df.copy()
th = 4
x = df.col1.eq(0)
g = x.astype(int).diff().fillna(0).ne(0).cumsum()
ix = x[x].groupby(g[x]).transform('size').lt(th) 
result.loc[ix[ix].index, 'col1'] = 1

df.assign(x=x, g=g, ix=ix, result=result)

     col1   x    g    ix     result
0      0   True  0   True       1
1      0   True  0   True       1
2      0   True  0   True       1
3      2  False  1    NaN       2
4      1  False  1    NaN       1
5      3  False  1    NaN       3
6      0   True  2   True       1
7      1  False  3    NaN       1
8      2  False  3    NaN       2
9      0   True  4  False       0
10     0   True  4  False       0
11     0   True  4  False       0
12     0   True  4  False       0
13     1  False  5    NaN       1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM