[英]Python - split list in to sublists based on another list
我有两个列表: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
和l2 = [0.5, 1.0, 1.5, 2.0]
。 我想将l1
分割为子列表,这些子列表被定义为l2
两个索引之间的元素。 因此,例如l1
将等于[[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
。
这是我的解决方案:
l3 = []
b=0
for i in l2:
temp = []
for p in l1:
if b <= p < i:
temp.append(p)
l3.append(temp)
b+=0.5
这个解决方案是我代码中的一个巨大瓶颈。 有更快的方法吗?
您的列表已排序,因此无需在此处执行双循环。
以下内容基于两个列表作为输入生成子列表:
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
然后,您可以遍历partition(l1, l2)
以获取单个子列表,或者调用list()
以一次性生成整个列表列表:
>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
作为一种快速的方法,您可以使用numpy
非常有效的方式来处理大型列表:
>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0. , 0.002, 0.3 ]), array([ 0.5, 0.6, 0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]
np.searchsorted
将在l1
找到l2
项的索引,而l1
保持排序(使用其默认排序), np.split
将根据索引列表拆分列表。
一个基准,接受的答案列表1000倍大:
from timeit import timeit
s1="""
def partition(values, indices):
idx = 0
for index in indices:
sublist = []
while idx < len(values) and values[idx] < index:
sublist.append(values[idx])
idx += 1
if sublist:
yield sublist
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))
"""
s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
"""
print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")
结果:
1st: 17.5872459412
2nd : 10.3306460381
def split_l(a,b):
it = iter(b)
start, sub = next(it), []
for ele in a:
if ele >= start:
yield sub
sub, start = [], next(it)
sub.append(ele)
yield sub
print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
使用kasras输入这会击败接受的答案和numpy解决方案:
In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
In [15]: l1.sort()
In [16]: l2 = [0.5, 1.0, 1.5, 2.0]
In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop
In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop
In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop
In [20]: list(split_l(l1,l2)) == list(partition(l1,l2))
Out[20]: True
创建一个本地引用以追加更多关闭:
def split_l(a, b):
it = iter(b)
start, sub = next(it), []
append = sub.append
for ele in a:
if start <= ele:
yield sub
start, sub = next(it), []
append = sub.append
append(ele)
yield sub
在numpy解决方案的时间运行:
In [47]: l1.sort()
In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop
In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop
In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]
l2 = [0.5, 1.0, 1.5, 2.0]
def partition(values, indices):
temp = []
p_list = []
for j in range(len(indices)):
for i in range(len(values)):
if indices[j] > values[i]:
temp.append(values[i])
p_list.append(temp)
# added to the partition values are truncated from the list
values = values[len(temp):]
temp = []
print(p_list)
分区(l1,l2)
[[0,0.002,0.3],[0.5,0.6,0.9],[1.3],[1.9]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.