[英]How can I check, given a data frame that the values of a column are in increasing order without any missing number?
I have a data frame that has values like this :我有一个具有如下值的数据框:
Number
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
18
20
21
22
By using the sorted function of pandas, I have values that are increasing but I want to check and spot if there are any missing values and in that case, report the start and end of the consecutive value set.For example, in this case, it should return [1,10],[12,16],[20,22].通过使用 pandas 的 sorted 函数,我的值正在增加,但我想检查并发现是否有任何缺失值,在这种情况下,报告连续值集的开始和结束。例如,在这种情况下,它应该返回 [1,10],[12,16],[20,22]。 I want to chuck here for example 18 even though it's there but it's not consecutive.
我想在这里卡住例如 18 即使它在那里但它不是连续的。 Any lead on how to approach this problem?
关于如何解决这个问题的任何线索?
You can aggregate by compare differencies for not equal 1
with cumulative sum and get minimal and maximal values, remove rows with same minimal and maximal like here 18
and last convert to nested lists:您可以通过比较不等于
1
与累积总和的差异进行聚合,并获得最小值和最大值,删除具有相同最小值和最大值的行,如此处18
并最后转换为嵌套列表:
df1 = df.groupby(df['Number'].diff().ne(1).cumsum())['Number'].agg(['min','max'])
print (df1)
min max
Number
1 1 10
2 12 16
3 18 18
4 20 22
df1 = df1[df1['min'].ne(df1['max'])]
print (df1)
min max
Number
1 1 10
2 12 16
4 20 22
out = [list(x) for x in df1.to_numpy()]
print (out)
[[1, 10], [12, 16], [20, 22]]
You could start by identifying the groups with consecutive values and take the first and last values of these groups.您可以首先识别具有连续值的组,然后取这些组的第一个和最后一个值。 Then drop those groups that only contain one value (as
18
) and convert to a list:然后删除那些只包含一个值(如
18
)的组并转换为列表:
g = df.Number.diff().fillna(1).ne(1).cumsum()
out = df.groupby(g).nth((0,-1))
out[out.index.duplicated(False)].groupby(level=0).agg(list).Number.tolist()
# [[1, 10], [12, 16], [20, 22]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.