简体   繁体   中英

How can I check, given a data frame that the values of a column are in increasing order without any missing number?

I have a data frame that has values like this :

Number 
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
18
20
21
22

By using the sorted function of pandas, I have values that are increasing but I want to check and spot if there are any missing values and in that case, report the start and end of the consecutive value set.For example, in this case, it should return [1,10],[12,16],[20,22]. I want to chuck here for example 18 even though it's there but it's not consecutive. Any lead on how to approach this problem?

You can aggregate by compare differencies for not equal 1 with cumulative sum and get minimal and maximal values, remove rows with same minimal and maximal like here 18 and last convert to nested lists:

df1 = df.groupby(df['Number'].diff().ne(1).cumsum())['Number'].agg(['min','max'])
print (df1)
        min  max
Number          
1         1   10
2        12   16
3        18   18
4        20   22


df1 = df1[df1['min'].ne(df1['max'])]
print (df1)
        min  max
Number          
1         1   10
2        12   16
4        20   22


out = [list(x) for x in df1.to_numpy()]
print (out)
[[1, 10], [12, 16], [20, 22]]

You could start by identifying the groups with consecutive values and take the first and last values of these groups. Then drop those groups that only contain one value (as 18 ) and convert to a list:

g = df.Number.diff().fillna(1).ne(1).cumsum()
out = df.groupby(g).nth((0,-1))
out[out.index.duplicated(False)].groupby(level=0).agg(list).Number.tolist()
# [[1, 10], [12, 16], [20, 22]]

   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM