简体   繁体   English

如何过滤大于指定连续整数计数的行

[英]How to filter for rows that have greater than a specified count of consecutive integers

I have a dataframe that is like this:我有一个像这样的数据框:

Frame_Number Parts X Y

7      Y.     7. 9
7      G      :  :
8      Y      :  :
8      Y      :  :
8      Y      :  :
9      :      :  :
10
18
18
18
18
19
20
20
21
21
22
23
24
24
25
25
25
26
27
28
29
29
29
29
30
42
45
80
80
80
81
81
81
82
82
83
109
109
120
121
122
123
124
125
126
127
128
129
130
131
132
132
132
133
190
200
202
204
205
206
:
1000

I want to select the subset of this dataframe that has at least 25 consecutive number.我想选择这个数据帧的至少有 25 个连续数字的子集。

For example, in my dataset there is [1,1,2,3,5,6,8,9,10..33,34,35,36,37,40,.65.1000]例如,在我的数据集中有 [1,1,2,3,5,6,8,9,10..33,34,35,36,37,40,.65.1000]

So, here from 8 to 37, we have consecutive values and more than 25 (different)numbers.所以,这里从 8 到 37,我们有连续的值和超过 25 个(不同)的数字。 Also, from 40 to 65 we say that there are 25 consecutive numbers present.此外,从 40 到 65,我们说有 25 个连续的数字存在。 So, I want to select these sets of rows where there are at least 25 or more consecutive data frames present as a subset.因此,我想选择这些行集,其中至少有 25 个或更多连续数据帧作为子集存在。

  1. Create groups of consecutive integers:创建连续整数组:

You can create a series s that takes the cumulative count of non-consecutive integers.您可以创建一个系列s ,它采用非连续整数的累积计数。 That means consecutive rows will have the same values in the series s as they are in the same "group" of consecutive integers这意味着连续行在系列s中将具有相同的值,因为它们在连续整数的同一“组”中

  1. Count consecutive integers within groups:计算组内的连续整数:

Then, you can create m by calculating the count of consecutive integers in each group using .groupby(s)然后,您可以通过使用.groupby(s)计算每个组中连续整数的count来创建m

  1. Filter for groups that have consecutive values > n:过滤具有连续值 > n 的组:

Filter the df by m where greater than any value you specify (25 in this case)m过滤df ,其中大于您指定的任何值(在本例中为 25)


df = df.sort_values('A').drop_duplicates(subset='A')
s = (df['A'] != df['A'].shift(1) + 1).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]: 
      A
7    18
11   19
12   20
15   21
16   22
17   23
18   24
20   25
23   26
24   27
25   28
26   29
30   30
44  120
45  121
46  122
47  123
48  124
49  125
50  126
51  127
52  128
53  129
54  130
55  131
56  132
59  133

Another option to keep dupliccates:保持重复的另一种选择:

df = d2.copy()
s = ((df['A'] != df['A'].shift(1)) & (df['A'] != df['A'].shift(1) + 1)).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]: 
      A
7    18
8    18
9    18
10   18
11   19
12   20
13   20
14   21
15   21
16   22
17   23
18   24
19   24
20   25
21   25
22   25
23   26
24   27
25   28
26   29
27   29
28   29
29   29
30   30
44  120
45  121
46  122
47  123
48  124
49  125
50  126
51  127
52  128
53  129
54  130
55  131
56  132
57  132
58  132
59  133

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM