繁体   English   中英

Python - 根据另一个列表将列表拆分为子列表

[英]Python - split list in to sublists based on another list

我有两个列表: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]l2 = [0.5, 1.0, 1.5, 2.0] 我想将l1分割为子列表,这些子列表被定义为l2两个索引之间的元素。 因此,例如l1将等于[[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

这是我的解决方案:

l3 = []
b=0
for i in l2:
    temp = []
    for p in l1:
        if b <= p < i:
        temp.append(p)
    l3.append(temp)
    b+=0.5

这个解决方案是我代码中的一个巨大瓶颈。 有更快的方法吗?

您的列表已排序,因此无需在此处执行双循环。

以下内容基于两个列表作为输入生成子列表:

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

然后,您可以遍历partition(l1, l2)以获取单个子列表,或者调用list()以一次性生成整个列表列表:

>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] 
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

作为一种快速的方法,您可以使用numpy非常有效的方式来处理大型列表:

>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0.   ,  0.002,  0.3  ]), array([ 0.5,  0.6,  0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]

np.searchsorted将在l1找到l2项的索引,而l1保持排序(使用其默认排序), np.split将根据索引列表拆分列表。

一个基准,接受的答案列表1000倍大:

from timeit import timeit

s1="""

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))

"""

s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
   """

print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")

结果:

1st:  17.5872459412
2nd :  10.3306460381
def split_l(a,b):
    it = iter(b)
    start, sub = next(it), []
    for ele in a:
        if ele >= start:
            yield sub
            sub, start = [], next(it)
        sub.append(ele)
    yield sub

print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

使用kasras输入这会击败接受的答案和numpy解决方案:

In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000

In [15]: l1.sort()

In [16]: l2 = [0.5, 1.0, 1.5, 2.0]

In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop

In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop

In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop

In [20]: list(split_l(l1,l2))  == list(partition(l1,l2))
Out[20]: True

创建一个本地引用以追加更多关闭:

def split_l(a, b):
    it = iter(b)
    start, sub = next(it), []
    append = sub.append
    for ele in a:
        if start <= ele:
            yield sub
            start, sub = next(it), []
            append = sub.append
        append(ele)
    yield sub

在numpy解决方案的时间运行:

In [47]: l1.sort()

In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop

In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop

In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]

l2 = [0.5, 1.0, 1.5, 2.0]


  def partition(values, indices):

    temp = []
    p_list = []


    for j in range(len(indices)):
        for i in range(len(values)):
            if indices[j] > values[i]:
                temp.append(values[i])

        p_list.append(temp)

        # added to the partition values are truncated from the list
        values = values[len(temp):]

        temp = []

    print(p_list)

分区(l1,l2)

[[0,0.002,0.3],[0.5,0.6,0.9],[1.3],[1.9]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM