熊猫：在具有重复元素的列中识别连续数字

Question

Sorry if the question is not clear, let me describe my issue in this post. 抱歉，如果问题不清楚，让我在这篇文章中描述我的问题。 I have the following dataframe: 我有以下数据框：

        value                created_at   t_diff  flag_1
0   18.930542 2019-03-03 21:43:08-05:00 00:00:00       1
1   18.895210 2019-03-03 21:44:09-05:00 00:00:00       1
2   18.895210 2019-03-03 21:45:09-05:00 00:00:00       1
3   18.885010 2019-03-03 21:46:10-05:00 00:04:04       2
4    0.000000 2019-03-03 21:47:11-05:00 00:04:04       2
5    0.000000 2019-03-03 21:48:12-05:00 00:04:04       2
6    0.000000 2019-03-03 21:49:13-05:00 00:04:04       2
7    0.000000 2019-03-03 21:50:14-05:00 00:04:04       2
8   18.857025 2019-03-03 21:51:14-05:00 00:00:00       3
9   18.847290 2019-03-03 21:52:15-05:00 00:00:00       3
10  18.847290 2019-03-03 21:53:17-05:00 00:00:00       3
11  18.873283 2019-03-03 21:54:17-05:00 00:00:00       3
12  18.873283 2019-03-03 21:55:19-05:00 00:00:00       3
13  18.837677 2019-03-03 21:56:19-05:00 00:00:00       3
20  18.830170 2019-03-03 22:03:25-05:00 00:00:00       5
21  18.826149 2019-03-03 22:04:26-05:00 00:00:00       5
22  18.826149 2019-03-03 22:05:27-05:00 00:00:00       5
23  18.830795 2019-03-03 22:06:28-05:00 00:00:00       5

From the column 'flag_1', I'd like to identify the elements that, despite being repeated, form a succession of consecutive numbers. 我想从“标志_1”列中识别出尽管重复但仍形成连续数字的元素。 The outcome I desire is like the following 我想要的结果如下

        value                created_at   t_diff  flag_1  flag_2
0   18.930542 2019-03-03 21:43:08-05:00 00:00:00       1       1
1   18.895210 2019-03-03 21:44:09-05:00 00:00:00       1       1
2   18.895210 2019-03-03 21:45:09-05:00 00:00:00       1       1
3   18.885010 2019-03-03 21:46:10-05:00 00:04:04       2       1
4    0.000000 2019-03-03 21:47:11-05:00 00:04:04       2       1
5    0.000000 2019-03-03 21:48:12-05:00 00:04:04       2       1
6    0.000000 2019-03-03 21:49:13-05:00 00:04:04       2       1
7    0.000000 2019-03-03 21:50:14-05:00 00:04:04       2       1
8   18.857025 2019-03-03 21:51:14-05:00 00:00:00       3       1
9   18.847290 2019-03-03 21:52:15-05:00 00:00:00       3       1
10  18.847290 2019-03-03 21:53:17-05:00 00:00:00       3       1
11  18.873283 2019-03-03 21:54:17-05:00 00:00:00       3       1
12  18.873283 2019-03-03 21:55:19-05:00 00:00:00       3       1
13  18.837677 2019-03-03 21:56:19-05:00 00:00:00       3       1
20  18.830170 2019-03-03 22:03:25-05:00 00:00:00       5       2
21  18.826149 2019-03-03 22:04:26-05:00 00:00:00       5       2
22  18.826149 2019-03-03 22:05:27-05:00 00:00:00       5       2
23  18.830795 2019-03-03 22:06:28-05:00 00:00:00       5       2

Column named 'flag_2' should be populated with a numeric identifier each time of these "successions" from consecutive repeated numbers occur. 每当来自连续重复数字的这些“成功”出现时，应在名称为“ flag_2”的列中填充数字标识符。 1 for the first, 2 for the second, 3 for the third and so on. 1代表第一个，2代表第二个，3代表第三个，依此类推。

I have been trying to do this indirectly, using df.flag_1.unique() and then with the help of more-itertools created a nested list which I would loop over, slicing the dataframe using isin from Pandas . 我一直在尝试使用df.flag_1.unique（）间接执行此操作，然后在more-itertools的帮助下创建了一个嵌套列表，该列表将循环播放，使用Pandas中的isin切片数据框。

I'd like to know if there's a way to do all this with Pandas and without using more-itertools and the rest of my approach. 我想知道是否有一种方法可以对Pandas进行所有操作，而无需使用更多的Itertools和其余方法。

Can you help me out please? 你能帮我吗？ Thanks in advance! 提前致谢！

Answer 1

You can create it by using diff and cumsum , logic here is continue value the different should not greater than 1 , in your example, every time it will increase by one or maintain the same (no change so the different should be 0 ) 您可以使用diff和cumsum来创建它，这里的逻辑是continue值，其差异不应大于1，在您的示例中，每次它将增加1或保持相同（不变，因此该差异应为0）

df.flag_1.diff().gt(1).cumsum()+1
Out[351]: 
0     1
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    1
20    2
21    2
22    2
23    2
Name: flag_1, dtype: int32

熊猫：在具有重复元素的列中识别连续数字

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-22 00:35:33

熊猫：在具有重复元素的列中识别连续数字

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-22 00:35:33

解决方案1
1 已采纳 2019-03-22 00:35:33