[英]Elements in list greater than or equal to elements in other list (without for loop?)
I have a list containing 1,000,000 elements (numbers) called x and I would like to count how many of them are equal to or above [0.5,0.55,0.60,...,1].我有一个包含 1,000,000 个元素(数字)的列表,称为 x,我想计算其中有多少等于或高于 [0.5,0.55,0.60,...,1]。 Is there a way to do it without a for loop?
有没有办法在没有 for 循环的情况下做到这一点?
Right now I have the following the code, which works for a specific value of the [0.5,...1] interval, let's say 0.5 and assigns it to the count variable现在我有以下代码,它适用于 [0.5,...1] 间隔的特定值,比如说 0.5 并将其分配给 count 变量
count=len([i for i in x if i >= 0.5])
EDIT: Basically what I want to avoid is doing this... if possible?编辑:基本上我想避免的是这样做......如果可能的话?
obs=[]
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]
for a in alpha:
count= len([i for i in x if i >= a])
obs.append(count)
Thanks in advance Best, Mikael在此先感谢 最好的,米凯尔
I don't think it's possible without loop, but you can sort the array x
and then you can use bisect
module ( doc ) to locate insertion point (index).我认为没有循环是不可能的,但是您可以对数组
x
进行排序,然后您可以使用bisect
模块( doc )来定位插入点(索引)。
For example:例如:
x = [0.341, 0.423, 0.678, 0.999, 0.523, 0.751, 0.7]
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]
x = sorted(x)
import bisect
obs = [len(x) - bisect.bisect_left(x, a) for a in alpha]
print(obs)
Will print:将打印:
[5, 4, 4, 4, 3, 2, 1, 1, 1, 1, 0]
Note:笔记:
sorted()
has complexity n log(n)
and bisect_left()
log(n)
sorted()
具有复杂度n log(n)
和bisect_left()
log(n)
You can use numpy and boolean indexing:您可以使用 numpy 和 boolean 索引:
>>> import numpy as np
>>> a = np.array(list(range(100)))
>>> a[a>=50].size
50
EDIT: If you are using NumPy already, you can simply do this:编辑:如果您已经在使用 NumPy,您可以简单地执行以下操作:
import numpy as np
# Make random data
np.random.seed(0)
x = np.random.binomial(n=20, p=0.5, size=1000000) / 20
bins = np.arange(0.55, 1.01, 0.05)
# One extra value for the upper bound of last bin
bins = np.append(bins, max(bins.max(), x.max()) + 1)
h, _ = np.histogram(x, bins)
result = np.cumsum(h)
print(result)
# [280645 354806 391658 406410 411048 412152 412356 412377 412378 412378]
If you are dealing with large arrays of numbers, you may considering using NumPy .如果您正在处理大型 arrays 数字,您可以考虑使用NumPy 。 But if you are using simple Python lists, you can do that for example like this:
但是,如果您使用的是简单的 Python 列表,您可以这样做,例如:
def how_many_bigger(nums, mins):
# List of counts for each minimum
counts = [0] * len(mins)
# For each number
for n in nums:
# For each minimum
for i, m in enumerate(mins):
# Add 1 to the count if the number is greater than the current minimum
if n >= m:
counts[i] += 1
return counts
# Test
import random
# Make random data
random.seed(0)
nums = [random.random() for _ in range(1_000_000)]
# Make minimums
mins = [i / 100. for i in range(55, 101, 5)]
print(mins)
# [0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]
count = how_many_bigger(nums, mins)
print(count)
# [449771, 399555, 349543, 299687, 249605, 199774, 149945, 99928, 49670, 0]
Even if you are not using for loop, internal methods use them.即使您不使用 for 循环,内部方法也会使用它们。 But iterates them efficiently.
但是有效地迭代它们。
you can use below function without for loop from your end.您可以在 function 下方使用,而无需从您的末端进行循环。
x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
l = list(filter(lambda _: _ > .5 , x))
print(l)
Based on comments, you're ok with using numpy, so use np.searchsorted
to simply insert alpha
into a sorted version of x
.根据评论,您可以使用 numpy,因此使用
np.searchsorted
将alpha
简单地插入到x
的排序版本中。 The indices will be your counts.指数将是你的计数。
If you're ok with sorting x
in-place:如果您可以就地对
x
进行排序:
x.sort()
counts = x.size - np.searchsorted(x, alpha)
If not,如果不,
counts = x.size - np.searchsorted(np.sort(x), alpha)
These counts assume that you want x < alpha
.这些计数假设您想要
x < alpha
。 To get <=
add the keyword side='right'
:要获得
<=
添加关键字side='right'
:
np.searchsorted(x, alpha, side='right')
PS附言
There are a couple of significant problems with the line这条线路有几个重大问题
count = len([i for i in x if i >= 0.5])
First of all, you're creating a list of all the matching elements instead of just counting them.首先,您正在创建所有匹配元素的列表,而不是仅仅计算它们。 To count them do
数他们做
count = sum(1 for i in x if i >= threshold)
Now the problem is that you are doing a linear pass through the entire array for each alpha, which is not necessary.现在的问题是,您正在为每个 alpha 对整个数组进行线性传递,这是不必要的。
As I commented under @Andrej Kesely's answer , let's say we have N = len(x)
and M = len(alpha)
.正如我在@Andrej Kesely 的回答下评论的那样,假设我们有
N = len(x)
和M = len(alpha)
。 Your implementation is O(M * N)
time complexity, while sorting gives you O((M + N) log N)
.您的实现是
O(M * N)
时间复杂度,而排序给您O((M + N) log N)
。 For M << N
(small alpha
), your complexity is approximately O(N)
, which beats O(N log N)
.对于
M << N
(小alpha
),您的复杂性大约为O(N)
,优于O(N log N)
。 But for M ~= N
, yours approaches O(N^2)
vs my O(N log N)
.但是对于
M ~= N
,你的接近O(N^2)
与我的O(N log N)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.