简体   繁体   中英

Find index of first occurrence in sorted list

I have a sorted list that looks like this:

sortedlist = ['0','0','0','1','1,'1,'2',2','3']

I also have a count variable:

count = '1'

*note: sometimes count can be an integar greater that the max value in the list. For example count = '4'

What I want to do is to find the first occurrence of the count in the list and print the index. If the value is greater than the max value in the list, then assign a string. Here is what I have tried:

maxvalue = max(sortedlist)
for i in sortedlist:
    if int(count) < int(sortedlist[int(i)]):
        indexval = i
        break
        OutputFile.write(''+str(indexval)+'\n')
if int(count) > int(maxvalue):
    indexval = "over"
    OutputFile.write(''+str(indexval)+'\n')

I thought the break would end the for loop, but I'm only getting results from the last if statement. Am I doing something incorrectly?

Your logic is wrong, you have a so called sorted list of strings which unless you compared as integer would not be sorted correctly, you should use integers from the get-go and bisect_left to find index:

from bisect import bisect_left

sortedlist = sorted(map(int, ['0', '0', '0', '1', '1', '1', '2', '2', '3']))

count = 0

def get_val(lst, cn):
    if lst[-1] < cn:
        return "whatever"
    return bisect_left(lst, cn, hi=len(lst) - 1)

If the value falls between two as per your requirement, you will get the first index of the higher value, if you get an exact match you will get that index:

In [13]: lst = [0,0,2,2]

In [14]: get_val(lst, 1)
Out[14]: 2

In [15]: lst = [0,0,1,1,2,2,2,3]

In [16]: get_val(lst, 2)
Out[16]: 4

In [17]: get_val(lst, 9)
Out[17]: 'whatever'

As there are some over-complicated solutions here it's worth posting how straightforwardly this can be done:

def get_index(a, L):
    for i, b in enumerate(L):
        if b >= a:
            return i
    return "over"

get_index('1', ['0','0','2','2','3'])
>>> 2
get_index('1', ['0','0','0','1','2','3'])
>>> 3
get_index('4', ['0','0','0','1','2','3'])
>>> 'over'

But, use bisect .

You could use a function (using EAFP principle) to find the first occurrence that is equal to or greater than the count:

In [239]: l = ['0','0','0','1','1','1','2','2','3']

In [240]: def get_index(count, sorted_list):
     ...:     try:
     ...:         return next(x[0] for x in enumerate(l) if int(x[1]) >= int(count))
     ...:     except StopIteration:
     ...:         return "over"
     ...:     

In [241]: get_index('3', l)
Out[241]: 8

In [242]: get_index('7', l)
Out[242]: 'over'

As your list is already sorted, so the maximum value will be the last element of your list ie maxval = sortedlist[-1] . secondly there is an error in your for loop. for i in sortedlist: This gives you each element in the list . To get index do a for loop on range len(sortedlist) Here i is the element in the list. You should break after writing to the file. Below is the fixed code:

maxvalue = sortedlist[-1]
if int(count) > int(maxvalue):
    indexval = "over"
    OutputFile.write(''+str(indexval)+'\n')
else:
    for i in xrange(len(sortedlist)):
        if int(count) <= int(sortedlist[int(i)]):
            indexval = i
            OutputFile.write(''+str(indexval)+'\n')
            break

Using itertools.dropwhile() :

from itertools import dropwhile

sortedlist = [0, 0, 0, 1, 1, 1, 2, 2, 3]

def getindex(count):
    index = len(sortedlist) - len(list(dropwhile(lambda x: x < count, sortedlist)))
    return "some_string" if index >= len(sortedlist) else index

The test:

print(getindex(5))
> some_string

and:

print(getindex(3))
> 8

Explanation

dropwhile() drops the list until the first occurrence, when item < count returns False . By subrtracting the (number of) items after that from the length of the original list, we have the index.

" an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element ."

First of all:

for i in range(1, 100):
  if i >= 3:
    break
    destroyTheInterwebz()
  print i

Will never execute that last function. It will onmy peint 1 and 2 . Because break immediately leaves the loop; it does not wait for the current iteration to finish.

In my opinion, the code would be nicer if you used a function indexOf and return instead of break .

Last but not least: the data structures here are pretty expensive. You may want to use integers instead of strings, and numpy arrays. You could then use the very fast numpy.searchsorted function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM