Pandas 轧制 function 有重叠

Question

I would like to apply a function to one pandas dataframe column which does the following task:我想将 function 应用于一个 pandas dataframe 列，该列执行以下任务：

I have a cycle counter that starts from a value but sometimes restarts.我有一个循环计数器，它从一个值开始，但有时会重新启动。
I would like to have the counter continue and increase its value.我想让计数器继续并增加它的价值。

The function I use at the moment is the following one:我现在用的function是下面这个：

Code代码

import pandas as pd

d = {'Cycle':[100,100,100,100,101,101,101,102,102,102,102,102,102,103,103,103,100,100,100,100,101,101,101,101]}
df = pd.DataFrame(data=d)

df.loc[:,'counter'] = df['Cycle'].to_numpy()
df.loc[:,'counter'] = df['counter'].rolling(2).apply(lambda x: x[0] if (x[0] == x[1]) else x[0]+1, raw=True)

print(df)

Output Output

    Cycle  counter
0     100      NaN
1     100    100.0
2     100    100.0
3     100    100.0
4     101    101.0
5     101    101.0
6     101    101.0
7     102    102.0
8     102    102.0
9     102    102.0
10    102    102.0
11    102    102.0
12    102    102.0
13    103    103.0
14    103    103.0
15    103    103.0
16    100    104.0
17    100    100.0
18    100    100.0
19    100    100.0
20    101    101.0
21    101    101.0
22    101    101.0
23    101    101.0

My goal is to get a dataframe similar to this one:我的目标是得到一个类似于这个的 dataframe：

    Cycle  counter
0     100      NaN
1     100    100.0
2     100    100.0
3     100    100.0
4     101    101.0
5     101    101.0
6     101    101.0
7     102    102.0
8     102    102.0
9     102    102.0
10    102    102.0
11    102    102.0
12    102    102.0
13    103    103.0
14    103    103.0
15    103    103.0
16    100    104.0
17    100    104.0
18    100    104.0
19    100    104.0
20    101    105.0
21    101    105.0
22    101    105.0
23    101    105.0

How do I use the rolling function with one overlap?如何使用一个重叠的滚动 function？
Do you have any recommendation to reach my goal?你有什么建议可以达到我的目标吗？

Best regards,此致，

Matteo马泰奥

Answer 1

We can use shift and ne (same as != ) to check where the Cycle column changes.我们可以使用shift和ne （与!=相同）来检查Cycle列的更改位置。

Then we use cumsum to make a counter which changes each time Cycle changes.然后我们使用cumsum做一个计数器，每次Cycle改变时都会改变。

We add the first value of Cycle to the counter -1 , to let it start at 100 :我们将Cycle的第一个值添加到计数器-1中，让它从100开始：

groups = df['Cycle'].ne(df['Cycle'].shift()).cumsum()
df['counter'] = groups + df['Cycle'].iat[0] - 1

    Cycle  counter
0     100      100
1     100      100
2     100      100
3     100      100
4     101      101
5     101      101
6     101      101
7     102      102
8     102      102
9     102      102
10    102      102
11    102      102
12    102      102
13    103      103
14    103      103
15    103      103
16    100      104
17    100      104
18    100      104
19    100      104
20    101      105
21    101      105
22    101      105
23    101      105

Details: groups gives us a counter starting at 1 :详细信息： groups给我们一个从1开始的计数器：

print(groups)

0     1
1     1
2     1
3     1
4     2
5     2
6     2
7     3
8     3
9     3
10    3
11    3
12    3
13    4
14    4
15    4
16    5
17    5
18    5
19    5
20    6
21    6
22    6
23    6
Name: Cycle, dtype: int64

Answer 2

Another approach would be to identify the points in the Cycle column where the value changes using.diff().另一种方法是使用 .diff() 识别 Cycle 列中值发生变化的点。 Then at those points increment from the original initial cycle value and merge to the original dataframe forward filling the new values.然后在这些点从原始初始循环值递增并合并到原始 dataframe 向前填充新值。

df2 = df[df['Cycle'].diff().apply(lambda x: x!=0)].reset_index()
df2['Target Count'] = df[df['Cycle'].diff().apply(lambda x: x!=0)].reset_index().reset_index().apply(lambda x: df.iloc[0,0] + x['level_0'], axis = 1)
df = df.merge(df2.drop('Cycle', axis = 1), right_on = 'index', left_index = True, how = 'left').ffill().set_index('index', drop = True)
def df.index.name
df

Cycle  Target Count
0     100         100.0
1     100         100.0
2     100         100.0
3     100         100.0
4     101         101.0
5     101         101.0
6     101         101.0
7     102         102.0
8     102         102.0
9     102         102.0
10    102         102.0
11    102         102.0
12    102         102.0
13    103         103.0
14    103         103.0
15    103         103.0
16    100         104.0
17    100         104.0
18    100         104.0
19    100         104.0
20    101         105.0
21    101         105.0
22    101         105.0
23    101         105.0

Pandas 轧制 function 有重叠

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-10-27 21:30:54

解决方案2
0 2019-10-27 21:41:50

Pandas 轧制 function 有重叠

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-10-27 21:30:54

解决方案2 0 2019-10-27 21:41:50

解决方案1
0 已采纳 2019-10-27 21:30:54

解决方案2
0 2019-10-27 21:41:50