简体   繁体   English

找到不在列表中的最小正数

[英]Find the smallest positive number not in list

I have a list in python like this:我在 python 中有一个这样的列表:

myList = [1,14,2,5,3,7,8,12]

How can I easily find the first unused value?如何轻松找到第一个未使用的值? (in this case '4') (在本例中为“4”)

I came up with several different ways:我想出了几种不同的方法:

Iterate the first number not in set迭代第一个不在集合中的数字

I didn't want to get the shortest code (which might be the set-difference trickery) but something that could have a good running time.我不想得到最短的代码(这可能是 set-difference 诡计),而是一些可能有良好运行时间的东西。

This might be one of the best proposed here, my tests show that it might be substantially faster - especially if the hole is in the beginning - than the set-difference approach:这可能是这里提出的最好的方法之一,我的测试表明它可能比 set-difference 方法快得多 - 特别是如果洞在开始时 - :

from itertools import count, filterfalse # ifilterfalse on py2

A = [1,14,2,5,3,7,8,12]
print(next(filterfalse(set(A).__contains__, count(1))))

The array is turned into a set , whose __contains__(x) method corresponds to x in A .该数组变成了一个set ,其__contains__(x)方法对应x in A count(1) creates a counter that starts counting from 1 to infinity. count(1)创建一个从 1 开始计数到无穷大的计数器。 Now, filterfalse consumes the numbers from the counter, until a number is found that is not in the set;现在, filterfalse使用计数器中的数字,直到找到不在集合中的数字; when the first number is found that is not in the set it is yielded by next()当找到不在集合中的第一个数字时,它由next()

Timing for len(a) = 100000 , randomized and the sought-after number is 8 : len(a) = 100000 ,随机且抢手的数字是8

>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
0.9200698399945395
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
3.1420603669976117

Timing for len(a) = 100000 , ordered and the first free is 100001 len(a) = 100000 ,已订购,第一个空闲的是100001

>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
1.520096342996112
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
1.987783643999137

(note that this is Python 3 and range is the py2 xrange ) (请注意,这是 Python 3, range是 py2 xrange

Use heapq使用 heapq

The asymptotically good answer: heapq with enumerate渐近好的答案: heapq with enumerate

from heapq import heapify, heappop

heap = list(A)
heapify(heap)

from heapq import heapify, heappop
from functools import partial

# A = [1,2,3] also works
A = [1,14,2,5,3,7,8,12]

end = 2 ** 61      # these are different and neither of them can be the 
sentinel = 2 ** 62 # first gap (unless you have 2^64 bytes of memory).

heap = list(A)
heap.append(end)
heapify(heap)

print(next(n for n, v in enumerate(
     iter(partial(heappop, heap), sentinel), 1) if n != v))

Now, the one above could be the preferred solution if written in C, but heapq is written in Python and most probably slower than many other alternatives that mainly use C code.现在,如果用 C 编写,上面的可能是首选解决方案,但heapq是用 Python 编写的,很可能比其他许多主要使用 C 代码的替代方案慢。

Just sort and enumerate to find the first not matching只需排序和枚举即可找到第一个不匹配的

Or the simple answer with good constants for O(n lg n)或者 O(n lg n) 具有良好常数的简单答案

next(i for i, e in enumerate(sorted(A) + [ None ], 1) if i != e)

This might be fastest of all if the list is almost sorted because of how the Python Timsort works, but for randomized the set-difference and iterating the first not in set are faster.如果由于 Python Timsort 的工作方式而几乎对列表进行了排序,那么这可能是最快的,但是对于随机化的集合差异和迭代第一个不在集合中的集合会更快。

The + [ None ] is necessary for the edge cases of there being no gaps (eg [1,2,3] ). + [ None ]对于没有间隙的边缘情况(例如[1,2,3] )是必要的。

This makes use of the property of sets这利用了集合的属性

>>> l = [1,2,3,5,7,8,12,14]
>>> m = range(1,len(l))
>>> min(set(m)-set(l))
4

I would suggest you to use a generator and use enumerate to determine the missing element我建议您使用生成器并使用 enumerate 来确定缺少的元素

>>> next(a for a, b in enumerate(myList, myList[0]) if a != b)
4

enumerate maps the index with the element so your goal is to determine that element which differs from its index. enumerate将索引与元素映射在一起,因此您的目标是确定与其索引不同的元素。 Note, I am also assuming that the elements may not start with a definite value, in this case which is 1 , and if it is so, you can simplify the expression further as请注意,我还假设元素可能不会以确定的值开头,在这种情况下为1 ,如果是这样,您可以进一步简化表达式为

>>> next(a for a, b in enumerate(myList, 1) if a != b)
4

Don't know how efficient, but why not use an xrange as a mask and use set minus?不知道效率如何,但为什么不使用 xrange 作为掩码并使用 set减号?

>>> myList = [1,14,2,5,3,7,8,12]
>>> min(set(xrange(1, len(myList) + 1)) - set(myList))
4

You're only creating a set as big as myList , so it can't be that bad :)您只创建了一个与myList一样大的myList ,所以它不会那么糟糕:)

This won't work for "full" lists:这不适用于“完整”列表:

>>> myList = range(1, 5)
>>> min(set(xrange(1, len(myList) + 1)) - set(myList))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence

But the fix to return the next value is simple (add one more to the masked set):但是返回下一个值的修复很简单(在掩码集中再添加一个):

>>> min(set(xrange(1, len(myList) + 2)) - set(myList))
5
import itertools as it

next(i for i in it.count() if i not in mylist)

I like this because it reads very closely to what you're trying to do: "start counting, keep going until you reach a number that isn't in the list, then tell me that number".我喜欢这个,因为它与你想要做的事情非常接近:“开始计数,继续计算,直到你找到一个不在列表中的数字,然后告诉我那个数字”。 However, this is quadratic since testing i not in mylist is linear.然而,这是二次的,因为测试i not in mylist是线性的。

Solutions using enumerate are linear, but rely on the list being sorted and no value being repeated.使用 enumerate 的解决方案是线性的,但依赖于正在排序的列表并且没有重复的值。 Sorting first makes it O(n log n) overall, which is still better than quadratic.先排序使其整体为 O(n log n),这仍然比二次方好。 However, if you can assume the values are distinct, then you could put them into a set first:但是,如果您可以假设这些值是不同的,那么您可以先将它们放入一个集合中:

myset = set(mylist)
next(i for i in it.count() if i not in myset)

Since set containment checks are roughly constant time, this will be linear overall.由于设置包含检查的时间大致恒定,因此总体上将是线性的。

A for loop with the list will do it.带有列表的 for 循环将执行此操作。

l = [1,14,2,5,3,7,8,12]
for i in range(1, max(l)):
    if i not in  l: break
print(i) # result 4

I just solved this in a probably non pythonic way我只是以一种可能非 Pythonic 的方式解决了这个问题

def solution(A):
    # Const-ish to improve readability
    MIN = 1
    if not A: return MIN
    # Save re-computing MAX
    MAX = max(A)
    # Loop over all entries with minimum of 1 starting at 1
    for num in range(1, MAX):
        # going for greatest missing number return optimistically (minimum)
        # If order needs to switch, then use max as start and count backwards
        if num not in A: return num
    # In case the max is < 0 double wrap max with minimum return value
    return max(MIN, MAX+1)

I think it reads quite well我认为它读得很好

My effort, no itertools.我的努力,没有itertools。 Sets "current" to be the one less than the value you are expecting.将“当前”设置为小于您期望的值。

list = [1,2,3,4,5,7,8]
current = list[0]-1
for i in list:
    if i != current+1:
        print current+1
        break
    current = i

The naive way is to traverse the list which is an O(n) solution.最简单的方法是遍历 O(n) 解决方案的列表。 However, since the list is sorted, you can use this feature to perform binary search (a modified version for it).但是,由于列表已排序,因此您可以使用此功能执行二进制搜索(针对它的修改版本)。 Basically, you are looking for the last occurance of A[i] = i.基本上,您正在寻找 A[i] = i 的最后一次出现。

The pseudo algorithm will be something like:伪算法将类似于:

binarysearch(A):
  start = 0
  end = len(A) - 1
  while(start <= end ):
    mid = (start + end) / 2
    if(A[mid] == mid):
      result = A[mid]
      start = mid + 1
    else: #A[mid] > mid since there is no way A[mid] is less than mid
      end = mid - 1
  return (result + 1)

This is an O(log n) solution.这是一个 O(log n) 解决方案。 I assumed lists are one indexed.我假设列表是一个索引。 You can modify the indices accordingly您可以相应地修改索引

EDIT: if the list is not sorted, you can use the heapq python library and store the list in a min-heap and then pop the elements one by one编辑:如果列表未排序,您可以使用 heapq python 库并将列表存储在最小堆中,然后将元素一一弹出

pseudo code伪代码

H = heapify(A) //Assuming A is the list
count = 1
for i in range(len(A)):
  if(H.pop() != count): return count
  count += 1

sort + reduce to the rescue!排序+减少来拯救!

from functools  import reduce # python3
myList = [1,14,2,5,3,7,8,12]
res = 1 + reduce(lambda x, y: x if y-x>1 else y, sorted(myList), 0)
print(res)

Unfortunatelly it won't stop after match is found and will iterate whole list.不幸的是,它不会在找到匹配项后停止,而是会迭代整个列表。

Faster (but less fun) is to use for loop:更快(但不那么有趣)是使用 for 循环:

myList = [1,14,2,5,3,7,8,12]
res = 0
for num in sorted(myList):
    if num - res > 1:
        break
    res = num
res = res + 1
print(res)

you can try this你可以试试这个

for i in range(1,max(arr1)+2):
        if i not in arr1:
            print(i)
            break

The easiest way would be just to loop through the sorted list and check if the index is equal the value and if not return the index as solution.最简单的方法是循环遍历排序列表并检查索引是否等于值,如果不等于将索引作为解决方案返回。 This would have complexity O(nlogn) because of the sorting:由于排序,这将具有复杂度 O(nlogn):

for index,value in enumerate(sorted(myList)):
        if index is not value:
            return index

Another option is to use python sets which are somewhat dictionaries without values, just keys.另一种选择是使用 python 集,它有点像没有值的字典,只是键。 In dictionaries you can look for a key in constant time which make the whol solution look like the following, having only linear complexity O(n):在字典中,您可以在恒定时间内寻找一个键,这使得整个解决方案如下所示,只有线性复杂度 O(n):

mySet = set(myList)
for i in range(len(mySet)):
    if i not in mySet:
        return i

Keep incrementing a counter in a loop until you find the first positive integer that's not in the list.在循环中不断递增计数器,直到找到第一个不在列表中的正整数。

def getSmallestIntNotInList(number_list):
    """Returns the smallest positive integer that is not in a given list"""
    i = 0
    while True:
        i += 1
        if i not in number_list:
            return i    

print(getSmallestIntNotInList([1,14,2,5,3,7,8,12]))
# 4

I found that this had the fastest performance compared to other answers on this post.我发现与这篇文章中的其他答案相比,这具有最快的性能。 I tested using timeit in Python 3.10.8.我在 Python 3.10.8 中使用timeit进行了测试。 My performance results can be seen below:我的性能结果如下所示:

import timeit

def findSmallestIntNotInList(number_list):
    # Infinite while-loop until first number is found
    i = 0
    while True:
        i += 1
        if i not in number_list:
            return i

t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.038100800011307 seconds
import timeit

def findSmallestIntNotInList(number_list):
    # Loop with a range to len(number_list)+1 
    for i in range (1, len(number_list)+1):
        if i not in number_list:
            return i

t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.05068870005197823 seconds
import timeit

def findSmallestIntNotInList(number_list):
    # Loop with a range to max(number_list) (by silgon)
    # https://stackoverflow.com/a/49649558/3357935
    for i in range (1, max(number_list)):
        if i not in number_list:
            return i

t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.06317249999847263 seconds
import timeit
from itertools import count, filterfalse

def findSmallestIntNotInList(number_list):
    # iterate the first number not in set (by Antti Haapala -- Слава Україні)
    # https://stackoverflow.com/a/28178803/3357935
    return(next(filterfalse(set(number_list).__contains__, count(1))))

t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.06515420007053763 seconds
import timeit

def findSmallestIntNotInList(number_list):
    # Use property of sets (by Bhargav Rao)
    # https://stackoverflow.com/a/28176962/3357935
    m = range(1, len(number_list))
    return min(set(m)-set(number_list))

t = timeit.Timer(lambda: findSmallestIntNotInList([1,14,2,5,3,7,8,12]))
print('Execution time:', t.timeit(100000), 'seconds')
# Execution time: 0.08586219989228994 seconds

A solution that returns all those values is返回所有这些值的解决方案是

free_values = set(range(1, max(L))) - set(L)

it does a full scan, but those loops are implemented in C and unless the list or its maximum value are huge this will be a win over more sophisticated algorithms performing the looping in Python.它会进行全面扫描,但这些循环是用 C 实现的,除非列表或其最大值很大,否则这将胜过在 Python 中执行循环的更复杂的算法。

Note that if this search is needed to implement "reuse" of IDs then keeping a free list around and maintaining it up-to-date (ie adding numbers to it when deleting entries and picking from it when reusing entries) is a often a good idea.请注意,如果需要此搜索来实现 ID 的“重用”,那么保留一个空闲列表并使其保持最新状态(即在删除条目时添加数字并在重用条目时从中挑选)通常是一个很好的选择主意。

The following solution loops all numbers in between 1 and the length of the input list and breaks the loop whenever a number is not found inside it.以下解决方案循环 1 和输入列表长度之间的所有数字,并在其中找不到数字时中断循环。 Otherwise the result is the length of the list plus one.否则结果是列表的长度加一。

listOfNumbers=[1,14,2,5,3,7,8,12]
for i in range(1, len(listOfNumbers)+1):
   if not i in listOfNumbers: 
      nextNumber=i
      break
else:
   nextNumber=len(listOfNumbers)+1

Easy to read, easy to understand, gets the job done:易于阅读,易于理解,完成工作:

def solution(A):
    smallest = 1
    unique = set(A)
    for int in unique:
        if int == smallest:
            smallest += 1
    return smallest

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM