给定一个数据框，如何检查列的值按递增顺序排列而没有任何丢失的数字？

Question

I have a data frame that has values like this :我有一个具有如下值的数据框：

By using the sorted function of pandas, I have values that are increasing but I want to check and spot if there are any missing values and in that case, report the start and end of the consecutive value set.For example, in this case, it should return [1,10],[12,16],[20,22].通过使用 pandas 的 sorted 函数，我的值正在增加，但我想检查并发现是否有任何缺失值，在这种情况下，报告连续值集的开始和结束。例如，在这种情况下，它应该返回 [1,10],[12,16],[20,22]。 I want to chuck here for example 18 even though it's there but it's not consecutive.我想在这里卡住例如 18 即使它在那里但它不是连续的。 Any lead on how to approach this problem?关于如何解决这个问题的任何线索？

Answer 1

You can aggregate by compare differencies for not equal 1 with cumulative sum and get minimal and maximal values, remove rows with same minimal and maximal like here 18 and last convert to nested lists:您可以通过比较不等于1与累积总和的差异进行聚合，并获得最小值和最大值，删除具有相同最小值和最大值的行，如此处18并最后转换为嵌套列表：

df1 = df.groupby(df['Number'].diff().ne(1).cumsum())['Number'].agg(['min','max'])
print (df1)
        min  max
Number          
1         1   10
2        12   16
3        18   18
4        20   22


df1 = df1[df1['min'].ne(df1['max'])]
print (df1)
        min  max
Number          
1         1   10
2        12   16
4        20   22


out = [list(x) for x in df1.to_numpy()]
print (out)
[[1, 10], [12, 16], [20, 22]]

Answer 2

You could start by identifying the groups with consecutive values and take the first and last values of these groups.您可以首先识别具有连续值的组，然后取这些组的第一个和最后一个值。 Then drop those groups that only contain one value (as 18 ) and convert to a list:然后删除那些只包含一个值（如18 ）的组并转换为列表：

g = df.Number.diff().fillna(1).ne(1).cumsum()
out = df.groupby(g).nth((0,-1))
out[out.index.duplicated(False)].groupby(level=0).agg(list).Number.tolist()
# [[1, 10], [12, 16], [20, 22]]

给定一个数据框，如何检查列的值按递增顺序排列而没有任何丢失的数字？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-11-09 13:34:53

解决方案2
1 2020-11-09 13:28:40

给定一个数据框，如何检查列的值按递增顺序排列而没有任何丢失的数字？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-11-09 13:34:53

解决方案2 1 2020-11-09 13:28:40

解决方案1
2 已采纳 2020-11-09 13:34:53

解决方案2
1 2020-11-09 13:28:40