簡體   English   中英

如何檢查 pandas 列中接下來的 3 個連續行是否具有相同的值?

[英]How to check if next 3 consecutive rows in pandas column have same value?

我有一個 pandas dataframe 有 3 列 - iddatevalue

| id | date | value |
| --- | --- | --- |
| 1001 | 1-04-2021 | 61 |
| 1001 | 3-04-2021 | 61 |
| 1001 | 10-04-2021 | 61 |
| 1002 | 11-04-2021 | 13 |
| 1002 | 12-04-2021 | 12 |
| 1015 | 18-04-2021 | 42 |
| 1015 | 20-04-2021 | 42 |
| 1015 | 21-04-2021 | 43 |
| 2001 | 8-04-2021 | 27 |
| 2001 | 11-04-2021 | 27 |
| 2001 | 12-04-2021 | 27 |
| 2001 | 27-04-2021 | 27 |
| 2001 | 29-04-2021 | 27 |

我想檢查每個id有多少行,其中下一個 3 或 3 個以上的連續行在value列中具有相同的值? 一旦確定接下來的 3 個或更多連續行具有相同的值,則在單獨的列中將它們標記為 1,否則標記為 0。

所以最終的 dataframe 如下所示,

| id | date | value | pattern
| --- | --- | --- | --- |
| 1001 | 1-04-2021 | 61 | 1 |
| 1001 | 3-04-2021 | 61 | 1 |
| 1001 | 10-04-2021 | 61 | 1 |
| 1002 | 11-04-2021 | 13 | 0 |
| 1002 | 12-04-2021 | 12 | 0 |
| 1015 | 18-04-2021 | 42 | 0 |
| 1015 | 20-04-2021 | 42 | 0 |
| 1015 | 21-04-2021 | 43 | 0 |
| 2001 | 8-04-2021 | 27 | 1 |
| 2001 | 11-04-2021 | 27 | 1 |
| 2001 | 12-04-2021 | 27 | 1 |
| 2001 | 27-04-2021 | 27 | 1 |
| 2001 | 29-04-2021 | 27 | 1 |

嘗試使用groupby

df['pattern'] = (df.groupby(['id', df['value'].diff().ne(0).cumsum()])
                   ['id'].transform('size').ge(3).astype(int)
                )

這個怎么樣:

def f(x):
    x = x.fillna(0)
    y = len(x)*[0]
    for i in range(len(x)-3):
        if x[i+1] == 0 and x[i+2] == 0:
            y[i] = 1
            y[i+1] = 1
            y[i+2] = 1
    if x[len(x)-1] == 0 and x[len(x)-2] == 0 and x[len(x)-3] == 0:
        y[len(x)-1] = 1
    return pd.Series(y)

df['value'].diff().transform(f)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM