I have two lists: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
and l2 = [0.5, 1.0, 1.5, 2.0]
. I want to split l1
in to sublists that are defined as the elements between two indexes of l2
. So for example l1
would be equal to [[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
.
Here is my solution:
l3 = []
b=0
for i in l2:
temp = []
for p in l1:
if b <= p < i:
temp.append(p)
l3.append(temp)
b+=0.5
This solution is a huge bottleneck in my code. Is there a faster way to do this?
Your lists are sorted, so there is no need to do a double loop here.
The following generates the sublists based on the two lists as inputs:
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
You can then iterate over partition(l1, l2)
to get individual sublists, or call list()
to produce the whole list-of-lists in one go:
>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
As a fast way you can use numpy
pretty most efficient way for huge lists :
>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0. , 0.002, 0.3 ]), array([ 0.5, 0.6, 0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]
np.searchsorted
will find the indices of l2
items within l1
while l1
remains sorted (with its default sort) and np.split
will split your list based on a list of indices.
A benchmark with accepted answer on a list 1000 time bigger :
from timeit import timeit
s1="""
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))
"""
s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
"""
print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")
Result :
1st: 17.5872459412
2nd : 10.3306460381
def split_l(a,b):
it = iter(b)
start, sub = next(it), []
for ele in a:
if ele >= start:
yield sub
sub, start = [], next(it)
sub.append(ele)
yield sub
print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
using kasras input this beats both the accepted answer and the numpy solution:
In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
In [15]: l1.sort()
In [16]: l2 = [0.5, 1.0, 1.5, 2.0]
In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop
In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop
In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop
In [20]: list(split_l(l1,l2)) == list(partition(l1,l2))
Out[20]: True
Creating a local reference to append knocks even more off:
def split_l(a, b):
it = iter(b)
start, sub = next(it), []
append = sub.append
for ele in a:
if start <= ele:
yield sub
start, sub = next(it), []
append = sub.append
append(ele)
yield sub
Runs in just over the time of the numpy solution:
In [47]: l1.sort()
In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop
In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop
In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
l2 = [0.5, 1.0, 1.5, 2.0]
def partition(values, indices):
temp = []
p_list = []
for j in range(len(indices)):
for i in range(len(values)):
if indices[j] > values[i]:
temp.append(values[i])
p_list.append(temp)
# added to the partition values are truncated from the list
values = values[len(temp):]
temp = []
print(p_list)
partition(l1, l2)
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.