Say I have this simple class:
class Foo(object):
def __init__(self, number, name):
self.number = number
self.name = name
and a list of Foo instances:
l = [Foo(10, 'a'), Foo(9, 'a'), Foo(8, 'a'), Foo(7,'a'), Foo (5, 'b'), Foo (4, 'b') ,Foo (3, 'b')]
Say that the 'name' attribute can only be either 'a' or 'b'.
What is the fastest way to extract the sublist of all the objects whose 'name' is 'a' (or 'b')? Notice that this operation might be called several million times and this is why I want to optimize it as much as I can.
Note that the list is built in a way such that it will have all the elements 'grouped together' in the first or second half of the list. The list is symmetric and order by the decreasing attribute 'number'. EDIT : Not necessarily there is the same number of 'a' and 'b'.
How I do it:
In the beginning I was just doing a for loop:
sublist = []
for o in l:
if o.name == 'a'
sublist.append(o)
Then I tried with a list comprehension:
sublist = [o for o in l if o.name=='a']
But this seems to be approximately the same if not a bit slower.
Either way, neither of those exploits the assumption that all the attributes are already 'grouped together' in the original (sorted) list. It will keep looping even when it's no longer necessary. Speed is very important so I need it to be as performant as possible.
Just break out of the loop once you hit a non-match after matching
sublist = []
for o in l:
if o.name == 'a'
sublist.append(o)
elif sublist:
break
If you wanted to use generators, you could use the itertools
functions
from itertools import takewhile, dropwhile
sublist = list(takewhile(lambda o: o.name == 'a', dropwhile(lambda o: o.name != 'a', l))
These both exploit the fact that the list is sorted and stop processing the list after the items stop matching.
Since the name
attribute can only be 'a' or 'b' which are ordered and you have the same number of 'a' and 'b', the simplest way would be to find the middle point and slice the list:
mid = int(len(aList)/2)
sublist = l[:mid]
The above will give you all 'a' while l[mid:]
gives all 'b'.
Edit: Since the question was changed and it's no longer true that the number of elements of 'a' and 'b' are the same the above answer does not work anymore.
Depending on the length of the list, my guess would be that either binary search (for longer lists) or breaking out of the loop as Brendan suggested (for shorter ones) would be the fastest approach.
Use binary search to find the middle point in O(logN):
In [19]: class Foo(object):
...: def __init__(self, number, name):
...: self.number = number
...: self.name = name
...:
...: def __repr__(self):
...: return 'Foo(number={self.number}, name={self.name})'.format(self=self)
...:
In [20]: def binary_search(lst, predicate):
...: """
...: Finds the first element for which predicate(x) == True
...: """
...: lo, hi = 0, len(lst)
...: while lo < hi:
...: mid = (lo + hi) // 2
...: if predicate(lst[mid]):
...: hi = mid
...: else:
...: lo = mid + 1
...: return lo
...:
In [21]: l = [Foo(10, 'a'), Foo(9, 'a'), Foo(8, 'a'), Foo(7,'a'), Foo (5, 'b'), Foo (4, 'b'
...: ) ,Foo (3, 'b')]
In [22]: binary_search(l, lambda x: x.name == 'b')
Out[22]: 4
In [23]: l[:binary_search(l, lambda x: x.name == 'b')]
Out[23]:
[Foo(number=10, name=a),
Foo(number=9, name=a),
Foo(number=8, name=a),
Foo(number=7, name=a)]
In [24]: l[binary_search(l, lambda x: x.name == 'b'):]
Out[24]: [Foo(number=5, name=b), Foo(number=4, name=b), Foo(number=3, name=b)]
However, note, that:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.