[英]Pandas rolling function with overlap
I would like to apply a function to one pandas dataframe column which does the following task:我想将 function 应用于一个 pandas dataframe 列,该列执行以下任务:
The function I use at the moment is the following one:我现在用的function是下面这个:
Code代码
import pandas as pd
d = {'Cycle':[100,100,100,100,101,101,101,102,102,102,102,102,102,103,103,103,100,100,100,100,101,101,101,101]}
df = pd.DataFrame(data=d)
df.loc[:,'counter'] = df['Cycle'].to_numpy()
df.loc[:,'counter'] = df['counter'].rolling(2).apply(lambda x: x[0] if (x[0] == x[1]) else x[0]+1, raw=True)
print(df)
Output Output
Cycle counter
0 100 NaN
1 100 100.0
2 100 100.0
3 100 100.0
4 101 101.0
5 101 101.0
6 101 101.0
7 102 102.0
8 102 102.0
9 102 102.0
10 102 102.0
11 102 102.0
12 102 102.0
13 103 103.0
14 103 103.0
15 103 103.0
16 100 104.0
17 100 100.0
18 100 100.0
19 100 100.0
20 101 101.0
21 101 101.0
22 101 101.0
23 101 101.0
My goal is to get a dataframe similar to this one:我的目标是得到一个类似于这个的 dataframe:
Cycle counter
0 100 NaN
1 100 100.0
2 100 100.0
3 100 100.0
4 101 101.0
5 101 101.0
6 101 101.0
7 102 102.0
8 102 102.0
9 102 102.0
10 102 102.0
11 102 102.0
12 102 102.0
13 103 103.0
14 103 103.0
15 103 103.0
16 100 104.0
17 100 104.0
18 100 104.0
19 100 104.0
20 101 105.0
21 101 105.0
22 101 105.0
23 101 105.0
Best regards,此致,
Matteo马泰奥
We can use shift
and ne
(same as !=
) to check where the Cycle
column changes.我们可以使用
shift
和ne
(与!=
相同)来检查Cycle
列的更改位置。
Then we use cumsum
to make a counter which changes each time Cycle
changes.然后我们使用
cumsum
做一个计数器,每次Cycle
改变时都会改变。
We add the first value of Cycle
to the counter -1
, to let it start at 100
:我们将
Cycle
的第一个值添加到计数器-1
中,让它从100
开始:
groups = df['Cycle'].ne(df['Cycle'].shift()).cumsum()
df['counter'] = groups + df['Cycle'].iat[0] - 1
Cycle counter
0 100 100
1 100 100
2 100 100
3 100 100
4 101 101
5 101 101
6 101 101
7 102 102
8 102 102
9 102 102
10 102 102
11 102 102
12 102 102
13 103 103
14 103 103
15 103 103
16 100 104
17 100 104
18 100 104
19 100 104
20 101 105
21 101 105
22 101 105
23 101 105
Details: groups
gives us a counter starting at 1
:详细信息:
groups
给我们一个从1
开始的计数器:
print(groups)
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 3
8 3
9 3
10 3
11 3
12 3
13 4
14 4
15 4
16 5
17 5
18 5
19 5
20 6
21 6
22 6
23 6
Name: Cycle, dtype: int64
Another approach would be to identify the points in the Cycle column where the value changes using.diff().另一种方法是使用 .diff() 识别 Cycle 列中值发生变化的点。 Then at those points increment from the original initial cycle value and merge to the original dataframe forward filling the new values.
然后在这些点从原始初始循环值递增并合并到原始 dataframe 向前填充新值。
df2 = df[df['Cycle'].diff().apply(lambda x: x!=0)].reset_index()
df2['Target Count'] = df[df['Cycle'].diff().apply(lambda x: x!=0)].reset_index().reset_index().apply(lambda x: df.iloc[0,0] + x['level_0'], axis = 1)
df = df.merge(df2.drop('Cycle', axis = 1), right_on = 'index', left_index = True, how = 'left').ffill().set_index('index', drop = True)
def df.index.name
df
Cycle Target Count
0 100 100.0
1 100 100.0
2 100 100.0
3 100 100.0
4 101 101.0
5 101 101.0
6 101 101.0
7 102 102.0
8 102 102.0
9 102 102.0
10 102 102.0
11 102 102.0
12 102 102.0
13 103 103.0
14 103 103.0
15 103 103.0
16 100 104.0
17 100 104.0
18 100 104.0
19 100 104.0
20 101 105.0
21 101 105.0
22 101 105.0
23 101 105.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.