[英]breaking out of a loop in Julia
I have a Vector
of Vector
s of different length W
. 我有一个
Vector
的Vector
长度不同号第W
。 These last vectors contain integers between 0 and 150,000 in steps of 5 but can also be empty. 这些最后的向量包含0到150,000之间的整数,步长为5,但也可以为空。 I am trying to compute the empirical cdf for each of those vectors.
我试图计算每个向量的经验cdf。 I could compute these cdf iterating over every vector and every integer like this
我可以计算这些cdf迭代每个向量和每个这样的整数
cdfdict = Dict{Tuple{Int,Int},Float64}()
for i in 1:length(W)
v = W[i]
len = length(v)
if len == 0
pcdf = 1.0
else
for j in 0:5:150_000
pcdf = length(v[v .<= j])/len
cdfdict[i, j] = pcdf
end
end
end
However, this approach is inefficient because the cdf will be equal to 1 for j >= maximum(v)
and sometimes this maximum(v)
will be much lower than 150,000. 然而,这种方法是低效的,因为对于
j >= maximum(v)
,cdf将等于1,并且有时该maximum(v)
将远低于150,000。
My question is: how can I include a condition that breaks out of the j
loop for j > maximum(v)
but still assigns pcdf = 1.0
for the rest of j
s? 我的问题是:如何为
j > maximum(v)
包含一个突破j
循环的条件,但是仍然为j
s的其余部分分配pcdf = 1.0
?
I tried including a break
when j > maximum(v)
but this, of course, stops the loop from continuing for the rest of j
s. 当
j > maximum(v)
时,我尝试包括一个break
,但这当然会阻止循环继续j
s的其余部分。 Also, I can break the loop and then use get!
另外,我可以打破循环然后使用
get!
to access/include 1.0
for keys not found in cdfdict
later on, but that is not what I'm looking for. 访问/包含
1.0
以后在cdfdict
找不到的密钥,但这不是我正在寻找的。
break
only does one level. break
只做一个级别。 You can do what you want by wrapping the for loop function and using return
(instead of where you would've put break), or using @goto
. 您可以通过包装for循环函数并使用
return
(而不是放置中断的位置)或使用@goto
来执行您想要的@goto
。
Or where you would break, you could switch a boolean breakd=true
and then break, and at the bottom of the larger loop do if breakd break end
. 或者你要破坏的地方,你可以切换一个布尔值
breakd=true
然后中断,并且在较大的循环的底部做一个if breakd break end
。
You can use another for
loop to set all remaining elements to 1.0. 您可以使用另一个
for
循环将所有剩余元素设置为1.0。 The inner loop becomes 内环成为
m = maximum(v)
for j in 0:5:150_000
if j > m
for k in j:5:150_000
cdfdict[i, k] = 1.0
end
break
end
pcdf = count(x -> x <= j, v)/len
cdfdict[i, j] = pcdf
end
However, this is rather hard to understand. 但是,这很难理解。 It would be easier to use a branch.
使用分支会更容易。 In fact, this should be just as fast because the branch is very predictable.
事实上,这应该同样快,因为分支是非常可预测的。
m = maximum(v)
for j in 0:5:150_000
if j > m
cdfdict[i, j] = 1.0
else
pcdf = count(x -> x <= j, v)/len
cdfdict[i, j] = pcdf
end
end
To elaborate on my comment, this answer details an implementation which fills an Array instead of a Dict. 为了详细说明我的评论,这个答案详述了一个填充数组而不是Dict的实现。
First to create a random test case: 首先创建一个随机测试用例:
W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(150_000)*rand(10)))]
Next create an array of the right size filled with 1.0s: 接下来创建一个填充1.0s的正确大小的数组:
cdfmat = ones(Float64,length(W),length(0:5:150_000));
Now to fill the beginning of the CDFs: 现在填写CDF的开头:
for i=1:length(W)
v = sort(W[i])
k = 1
thresh = 0
for j=1:length(v)
if (j>1 && v[j]==v[j-1])
continue
end
pcdf = (j-1)/length(v)
while thresh<v[j]
cdfmat[i,k]=pcdf
k += 1
thresh += 5
end
end
end
This implementation uses a sort
which can be slow sometimes, but the other implementations basically compare the vector with various values which is even slower in most cases. 此实现使用的
sort
有时可能很慢,但其他实现基本上将向量与各种值进行比较,在大多数情况下这些值甚至更慢。
Another answer gave an implementation using an Array which calculated the CDF by sorting the samples and filling up the CDF bins with quantile values. 另一个答案给出了使用阵列的实现,该阵列通过对样本进行排序并用分位数值填充CDF箱来计算CDF。 Since the whole Array is thus filled, doing another pass on the array should not be overly costly (we tolerate a single pass already).
因为整个数组因此被填充,所以在阵列上进行另一次传递不应该过于昂贵(我们已经容忍了单次传递)。 The sorting bit and the allocation accompanying it can be avoided by calculating a histogram in the array and using
cumsum
to produce a CDF. 通过计算阵列中的直方图并使用
cumsum
产生CDF,可以避免排序位及其伴随的分配。 Perhaps the code will explain this better: 也许代码会更好地解释这个:
Initialize sizes, lengths and widths: 初始化尺寸,长度和宽度:
n = 10; w = 5; rmax = 150_000; hl = length(0:w:rmax)
Produce a sample example: 制作示例示例:
W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(rmax)*rand(n)))];
Calculate the CDFs: 计算CDF:
cdfmat = zeros(Float64,n,hl); # empty histograms
for i=1:n # drop samples into histogram bins
for j=1:length(W[i])
cdfmat[i,1+(W[i][j]+w-1)÷5]+=one(Float64)
end
end
cumsum!(cdfmat,cdfmat,2) # calculate pre-CDF by cumsum
for i=1:n # normalize each CDF by total
if cdfmat[i,hl]==zero(Float64) # check if histogram empty?
for j=1:hl # CDF of 1.0 as default (might be changed)
cdfmat[i,j] = one(Float64)
end
else # the normalization factor calc-ed once
f = one(Float64)/cdfmat[i,hl]
for j=1:hl
cdfmat[i,j] *= f
end
end
end
(a) Note the use of one
, zero
to prepare for change of Real type - this is good practice. (a)注意使用
one
, zero
来准备改变Real类型 - 这是一种很好的做法。 (b) Also adding various @inbounds
and @simd
should optimize further. (b)同时添加各种
@inbounds
和@simd
应进一步优化。 (c) Putting this code in a function is recommended (this is not done in this answer). (c)建议将此代码放在一个函数中(本答案中没有这样做)。 (d) If having a zero CDF for empty samples is OK (which means no samples means huge samples semantically), then the second
for
can be simplified. (d)如果使用具有零CDF为空的样品是行(这意味着没有样品意味着语义巨大样本),然后在第二
for
可被简化。
See other answers for more options, and reminder: Premature optimization is the root of all evil (Knuth??) 有关更多选项,请参阅其他答案,并提醒: 过早优化是所有邪恶的根源 (Knuth ??)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.