If my x list and y list are:
x = [10,20,30]
y = [1,2,3,15,22,27]
I'd like a return value to be a dictionary that has a count of the elements that were less than the x value:
{
10:3,
20:1,
30:2,
}
I have a very large list, so I was hoping there was a better way to do it that didn't involve a slow nested for loop. I've looked at collections.Counter and itertools and neither seem to offer a way of grouping. Is there a built-in that can do this?
You can use the bisect
module and collections.Counter
:
>>> import bisect
>>> from collections import Counter
>>> Counter(x[bisect.bisect_left(x, item)] for item in y)
Counter({10: 3, 30: 2, 20: 1})
If you're willing to use numpy, basically you are asking for a histogram:
x = [10,20,30]
y = [1,2,3,15,22,27]
np.histogram(y,bins=[0]+x)
#(array([3, 1, 2]), array([ 0, 10, 20, 30]))
To make this a dict:
b = np.histogram(y,bins=[0]+x)[0]
d = { k:v for k,v in zip(x, b)}
For short lists, this isn't worth it, but if your lists are long, it might be:
In [292]: y = np.random.randint(0, 30, 1000)
In [293]: %%timeit
.....: b = np.histogram(y, bins=[0]+x)[0]
.....: d = { k:v for k,v in zip(x, b)}
.....:
1000 loops, best of 3: 185 µs per loop
In [294]: y = list(y)
In [295]: timeit Counter(x[bisect.bisect_left(x, item)] for item in y)
100 loops, best of 3: 3.84 ms per loop
In [311]: timeit dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
100 loops, best of 3: 3.75 ms per loop
Short answer:
dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
Long answer
First we need to iterate over the y's to check which member is less than something. If we do it for 10 we get this:
>>> [n_y for n_y in y if n_y < 10]
[1, 2, 3]
Then we need to make that '10' a variable looking throw the x's:
>>> [[n_y for n_y in y if n_y < n_x] for n_x in x]
[[1, 2, 3], [1, 2, 3, 15], [1, 2, 3, 15, 22, 27]]
Finally, we need to add this results with the original x's. Here is when zip comes in handy:
>>> zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])
[(10, [1, 2, 3]), (20, [1, 2, 3, 15]), (30, [1, 2, 3, 15, 22, 27])]
This gives as a list of tuples, so we should cast dict on it to get the final result:
>>> dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
{10: [1, 2, 3], 20: [1, 2, 3, 15], 30: [1, 2, 3, 15, 22, 27]}
If the step between values in x
is always 10
, I would do it like this:
>>> y = [1,2,3,15,22,27]
>>> step = 10
>>> from collections import Counter
>>> Counter(n - n%step + step for n in y)
Counter({10: 3, 30: 2, 20: 1})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.