[英]Replace a list of numbers with flat sub-ranges
Given a list of numbers, like this: 给出一个数字列表,如下所示:
lst = [0, 10, 15, 17]
I'd like a list that has elements from i -> i + 3
for all i
in lst
. 我想要一个列表,其中包含来自i -> i + 3
元素,用于i
在lst
所有i
。 If there are overlapping ranges, I'd like them merged. 如果有重叠的范围,我希望它们合并。
So, for the example above, we first get: 所以,对于上面的例子,我们首先得到:
[0, 1, 2, 3, 10, 11, 12, 13, 15, 16, 17, 18, 17, 18, 19, 20]
But for the last 2 groups, the ranges overlap, so upon merging them, you have: 但对于最后两组,范围重叠,因此在合并它们时,您有:
[0, 1, 2, 3, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20]
This is my desired output. 这是我想要的输出。
This is what I've thought of: 这就是我的想法:
from collections import OrderedDict
res = list(OrderedDict.fromkeys([y for x in lst for y in range(x, x + 4)]).keys())
print(res) = [0, 1, 2, 3, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20]
However, this is slow ( 10000 loops, best of 3: 56 µs per loop
). 然而,这是缓慢的( 10000 loops, best of 3: 56 µs per loop
)。 I'd like a numpy solution if possible, or a python solution that's faster than this. 如果可能的话,我想要一个numpy解决方案,或者比这更快的python解决方案。
Approach #1 : One approach based on broadcasted
summation and then using np.unique
to get unique numbers - 方法#1:一种基于broadcasted
求和然后使用np.unique
获得唯一数字的方法 -
np.unique(np.asarray(lst)[:,None] + np.arange(4))
Approach #2 : Another based on broadcasted summation and then masking - 方法#2:另一种方法基于广播总和然后掩盖 -
def mask_app(lst, interval_len = 4):
arr = np.array(lst)
r = np.arange(interval_len)
ranged_vals = arr[:,None] + r
a_diff = arr[1:] - arr[:-1]
valid_mask = np.vstack((a_diff[:,None] > r, np.ones(interval_len,dtype=bool)))
return ranged_vals[valid_mask]
Runtime test 运行时测试
Original approach - 原创方法 -
from collections import OrderedDict
def org_app(lst):
list(OrderedDict.fromkeys([y for x in lst for y in range(x, x + 4)]).keys())
Timings - 计时 -
In [409]: n = 10000
In [410]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()
In [411]: %timeit org_app(lst)
...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
...: %timeit mask_app(lst, interval_len = 4)
...:
10 loops, best of 3: 32.7 ms per loop
1000 loops, best of 3: 1.03 ms per loop
1000 loops, best of 3: 671 µs per loop
In [412]: n = 100000
In [413]: lst = np.unique(np.random.randint(0,4*n,(n))).tolist()
In [414]: %timeit org_app(lst)
...: %timeit np.unique(np.asarray(lst)[:,None] + np.arange(4))
...: %timeit mask_app(lst, interval_len = 4)
...:
1 loop, best of 3: 350 ms per loop
100 loops, best of 3: 14.7 ms per loop
100 loops, best of 3: 9.73 ms per loop
The bottleneck with the two posted approaches seems like is with the conversion to array
, though that seems to be paying off well afterwards. 两个发布的方法的瓶颈似乎是转换到array
,虽然这似乎后来很好。 Just to give a sense of the time spent on the conversion for the last dataset - 只是为了了解最后一个数据集转换所花费的时间 -
In [415]: %timeit np.array(lst)
100 loops, best of 3: 5.6 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.