如何按范围对列表元素进行分组/计数

Question

If my x list and y list are: 如果我的x列表和y列表是：

x = [10,20,30]
y = [1,2,3,15,22,27]

I'd like a return value to be a dictionary that has a count of the elements that were less than the x value: 我希望返回值是一个字典，其中包含的元素数小于x值：

{
    10:3,
    20:1,
    30:2,
}

I have a very large list, so I was hoping there was a better way to do it that didn't involve a slow nested for loop. 我有一个非常大的列表，所以我希望有一个更好的方法来做这个不涉及缓慢的嵌套for循环。 I've looked at collections.Counter and itertools and neither seem to offer a way of grouping. 我看过集合.Counter和itertools似乎都没有提供分组方式。 Is there a built-in that can do this? 有没有可以做到这一点的内置？

Answer 1

You can use the bisect module and collections.Counter : 您可以使用bisect模块和collections.Counter ：

>>> import bisect
>>> from collections import Counter
>>> Counter(x[bisect.bisect_left(x, item)] for item in y)
Counter({10: 3, 30: 2, 20: 1})

Answer 2

If you're willing to use numpy, basically you are asking for a histogram: 如果你愿意使用numpy，基本上你要求直方图：

x = [10,20,30]
y = [1,2,3,15,22,27]

np.histogram(y,bins=[0]+x)
#(array([3, 1, 2]), array([ 0, 10, 20, 30]))

To make this a dict: 为了使这个dict：

b = np.histogram(y,bins=[0]+x)[0]
d = { k:v for k,v in zip(x, b)}

For short lists, this isn't worth it, but if your lists are long, it might be: 对于简短列表，这不值得，但如果您的列表很长，则可能是：

In [292]: y = np.random.randint(0, 30, 1000)

In [293]: %%timeit
   .....: b = np.histogram(y, bins=[0]+x)[0]
   .....: d = { k:v for k,v in zip(x, b)}
   .....: 
1000 loops, best of 3: 185 µs per loop

In [294]: y = list(y)

In [295]: timeit Counter(x[bisect.bisect_left(x, item)] for item in y)
100 loops, best of 3: 3.84 ms per loop

In [311]: timeit dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
100 loops, best of 3: 3.75 ms per loop

Answer 3

Short answer: 简短回答：

dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))

Long answer 答案很长

First we need to iterate over the y's to check which member is less than something. 首先，我们需要迭代y来检查哪个成员小于某个成员。 If we do it for 10 we get this: 如果我们这样做10，我们得到这个：

>>> [n_y for n_y in y if n_y < 10]
[1, 2, 3]

Then we need to make that '10' a variable looking throw the x's: 然后我们需要让'10'成为一个变量，然后抛出x：

>>> [[n_y for n_y in y if n_y < n_x] for n_x in x]
[[1, 2, 3], [1, 2, 3, 15], [1, 2, 3, 15, 22, 27]]

Finally, we need to add this results with the original x's. 最后，我们需要使用原始x添加此结果。 Here is when zip comes in handy: 这是拉链派上用场的时候：

>>> zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])
[(10, [1, 2, 3]), (20, [1, 2, 3, 15]), (30, [1, 2, 3, 15, 22, 27])]

This gives as a list of tuples, so we should cast dict on it to get the final result: 这给出了一个元组列表，所以我们应该在它上面输出dict来得到最终结果：

>>> dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
{10: [1, 2, 3], 20: [1, 2, 3, 15], 30: [1, 2, 3, 15, 22, 27]}

Answer 4

If the step between values in x is always 10 , I would do it like this: 如果x值之间的步长总是10 ，我会这样做：

>>> y = [1,2,3,15,22,27]
>>> step = 10
>>> from collections import Counter
>>> Counter(n - n%step + step for n in y)
Counter({10: 3, 30: 2, 20: 1})

如何按范围对列表元素进行分组/计数

问题描述

4 个解决方案

解决方案1
8 已采纳 2013-09-13 17:05:04

解决方案2
4 2013-09-13 17:05:33

解决方案3
1 2013-09-13 17:16:53

解决方案4
0 2013-09-13 17:50:21

如何按范围对列表元素进行分组/计数

问题描述

4 个解决方案

解决方案1 8 已采纳 2013-09-13 17:05:04

解决方案2 4 2013-09-13 17:05:33

解决方案3 1 2013-09-13 17:16:53

解决方案4 0 2013-09-13 17:50:21

解决方案1
8 已采纳 2013-09-13 17:05:04

解决方案2
4 2013-09-13 17:05:33

解决方案3
1 2013-09-13 17:16:53

解决方案4
0 2013-09-13 17:50:21