简体   繁体   English

给定一个数据框,如何检查列的值按递增顺序排列而没有任何丢失的数字?

[英]How can I check, given a data frame that the values of a column are in increasing order without any missing number?

I have a data frame that has values like this :我有一个具有如下值的数据框:

Number 
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
18
20
21
22

By using the sorted function of pandas, I have values that are increasing but I want to check and spot if there are any missing values and in that case, report the start and end of the consecutive value set.For example, in this case, it should return [1,10],[12,16],[20,22].通过使用 pandas 的 sorted 函数,我的值正在增加,但我想检查并发现是否有任何缺失值,在这种情况下,报告连续值集的开始和结束。例如,在这种情况下,它应该返回 [1,10],[12,16],[20,22]。 I want to chuck here for example 18 even though it's there but it's not consecutive.我想在这里卡住例如 18 即使它在那里但它不是连续的。 Any lead on how to approach this problem?关于如何解决这个问题的任何线索?

You can aggregate by compare differencies for not equal 1 with cumulative sum and get minimal and maximal values, remove rows with same minimal and maximal like here 18 and last convert to nested lists:您可以通过比较不等于1与累积总和的差异进行聚合,并获得最小值和最大值,删除具有相同最小值和最大值的行,如此处18并最后转换为嵌套列表:

df1 = df.groupby(df['Number'].diff().ne(1).cumsum())['Number'].agg(['min','max'])
print (df1)
        min  max
Number          
1         1   10
2        12   16
3        18   18
4        20   22


df1 = df1[df1['min'].ne(df1['max'])]
print (df1)
        min  max
Number          
1         1   10
2        12   16
4        20   22


out = [list(x) for x in df1.to_numpy()]
print (out)
[[1, 10], [12, 16], [20, 22]]

You could start by identifying the groups with consecutive values and take the first and last values of these groups.您可以首先识别具有连续值的组,然后取这些组的第一个和最后一个值。 Then drop those groups that only contain one value (as 18 ) and convert to a list:然后删除那些只包含一个值(如18 )的组并转换为列表:

g = df.Number.diff().fillna(1).ne(1).cumsum()
out = df.groupby(g).nth((0,-1))
out[out.index.duplicated(False)].groupby(level=0).agg(list).Number.tolist()
# [[1, 10], [12, 16], [20, 22]]

   

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查日期时间是否与数据框中缺少日期时间一致? - How do I check date time is in order with missing date times in data frame? 如何在我的数据框中找到缺失值,处理这些缺失值的最佳方法是什么? - how can i find the missing values in my data frame and what is the best method for handle this missing values? 如何检查pandas数据框中的每一列是否按升序排列 - How to check each column in pandas data frame are in ascending order or not Python-如何在列数据框名称pandas中重置数字顺序? - Python - how to reset the order of number in the column data frame name pandas? 如何在不引用旧列名和不创建新数据框的情况下更改列名? - How can I change column names without referencing old column names and without creating a new data frame? 如何识别 Python Pandas Data Frame 列中值的顺序? - How to recognize order of values in column in Python Pandas Data Frame? 我如何克服 Python pandas 和 matplotlib 中给定数据框的列名问题 - How can i overcome issue on column name of the given data frame in Python pandas and matplotlib 如何用零替换不平衡数据框中的缺失值? - How can I replace missing values from an unbalanced data frame with zeros? 如何检查列中的所有值是否满足 Data Frame 中的条件? - How to check whether all values in a column satisfy a condition in Data Frame? Python,在打开大量文件后,我如何称呼它们以没有任何预先指定的顺序进行写入? - Python, after opening large number of files, how can i call them to write without any pregiven order?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM