简体   繁体   English

突破了朱莉娅的一个循环

[英]breaking out of a loop in Julia

I have a Vector of Vector s of different length W . 我有一个VectorVector长度不同号第W These last vectors contain integers between 0 and 150,000 in steps of 5 but can also be empty. 这些最后的向量包含0到150,000之间的整数,步长为5,但也可以为空。 I am trying to compute the empirical cdf for each of those vectors. 我试图计算每个向量的经验cdf。 I could compute these cdf iterating over every vector and every integer like this 我可以计算这些cdf迭代每个向量和每个这样的整数

cdfdict = Dict{Tuple{Int,Int},Float64}()
for i in 1:length(W)
    v = W[i]
    len = length(v)
    if len == 0
        pcdf = 1.0
    else
        for j in 0:5:150_000
            pcdf = length(v[v .<= j])/len
            cdfdict[i, j] = pcdf
        end
    end
end

However, this approach is inefficient because the cdf will be equal to 1 for j >= maximum(v) and sometimes this maximum(v) will be much lower than 150,000. 然而,这种方法是低效的,因为对于j >= maximum(v) ,cdf将等于1,并且有时该maximum(v)将远低于150,000。

My question is: how can I include a condition that breaks out of the j loop for j > maximum(v) but still assigns pcdf = 1.0 for the rest of j s? 我的问题是:如何为j > maximum(v)包含一个突破j循环的条件,但是仍然为j s的其余部分分配pcdf = 1.0

I tried including a break when j > maximum(v) but this, of course, stops the loop from continuing for the rest of j s. j > maximum(v)时,我尝试包括一个break ,但这当然会阻止循环继续j s的其余部分。 Also, I can break the loop and then use get! 另外,我可以打破循环然后使用get! to access/include 1.0 for keys not found in cdfdict later on, but that is not what I'm looking for. 访问/包含1.0以后在cdfdict找不到的密钥,但这不是我正在寻找的。

break only does one level. break只做一个级别。 You can do what you want by wrapping the for loop function and using return (instead of where you would've put break), or using @goto . 您可以通过包装for循环函数并使用return (而不是放置中断的位置)或使用@goto来执行您想要的@goto

Or where you would break, you could switch a boolean breakd=true and then break, and at the bottom of the larger loop do if breakd break end . 或者你要破坏的地方,你可以切换一个布尔值breakd=true然后中断,并且在较大的循环的底部做一​​个if breakd break end

You can use another for loop to set all remaining elements to 1.0. 您可以使用另一个for循环将所有剩余元素设置为1.0。 The inner loop becomes 内环成为

m = maximum(v)
for j in 0:5:150_000
    if j > m
        for k in j:5:150_000
            cdfdict[i, k] = 1.0
        end
        break
    end
    pcdf = count(x -> x <= j, v)/len
    cdfdict[i, j] = pcdf
end

However, this is rather hard to understand. 但是,这很难理解。 It would be easier to use a branch. 使用分支会更容易。 In fact, this should be just as fast because the branch is very predictable. 事实上,这应该同样快,因为分支是非常可预测的。

m = maximum(v)
for j in 0:5:150_000
    if j > m
        cdfdict[i, j] = 1.0
    else
        pcdf = count(x -> x <= j, v)/len
        cdfdict[i, j] = pcdf
    end
end

To elaborate on my comment, this answer details an implementation which fills an Array instead of a Dict. 为了详细说明我的评论,这个答案详述了一个填充数组而不是Dict的实现。

First to create a random test case: 首先创建一个随机测试用例:

W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(150_000)*rand(10)))]

Next create an array of the right size filled with 1.0s: 接下来创建一个填充1.0s的正确大小的数组:

cdfmat = ones(Float64,length(W),length(0:5:150_000));

Now to fill the beginning of the CDFs: 现在填写CDF的开头:

for i=1:length(W)
    v = sort(W[i])
    k = 1
    thresh = 0
    for j=1:length(v)
        if (j>1 && v[j]==v[j-1])
            continue
        end
        pcdf = (j-1)/length(v)
        while thresh<v[j]
            cdfmat[i,k]=pcdf
            k += 1
            thresh += 5
        end
    end
end

This implementation uses a sort which can be slow sometimes, but the other implementations basically compare the vector with various values which is even slower in most cases. 此实现使用的sort有时可能很慢,但其他实现基本上将向量与各种值进行比较,在大多数情况下这些值甚至更慢。

Another answer gave an implementation using an Array which calculated the CDF by sorting the samples and filling up the CDF bins with quantile values. 另一个答案给出了使用阵列的实现,该阵列通过对样本进行排序并用分位数值填充CDF箱来计算CDF。 Since the whole Array is thus filled, doing another pass on the array should not be overly costly (we tolerate a single pass already). 因为整个数组因此被填充,所以在阵列上进行另一次传递不应该过于昂贵(我们已经容忍了单次传递)。 The sorting bit and the allocation accompanying it can be avoided by calculating a histogram in the array and using cumsum to produce a CDF. 通过计算阵列中的直方图并使用cumsum产生CDF,可以避免排序位及其伴随的分配。 Perhaps the code will explain this better: 也许代码会更好地解释这个:

Initialize sizes, lengths and widths: 初始化尺寸,长度和宽度:

n = 10; w = 5; rmax = 150_000; hl = length(0:w:rmax)

Produce a sample example: 制作示例示例:

W = [rand(0:mv,rand(0:10)) for mv in floor(Int,exp(log(rmax)*rand(n)))];

Calculate the CDFs: 计算CDF:

cdfmat = zeros(Float64,n,hl);  # empty histograms
for i=1:n                      # drop samples into histogram bins
  for j=1:length(W[i])
    cdfmat[i,1+(W[i][j]+w-1)÷5]+=one(Float64)
  end
end
cumsum!(cdfmat,cdfmat,2)       # calculate pre-CDF by cumsum
for i=1:n                      # normalize each CDF by total 
  if cdfmat[i,hl]==zero(Float64) # check if histogram empty?
    for j=1:hl                 # CDF of 1.0 as default (might be changed)
      cdfmat[i,j] = one(Float64)
    end
  else                         # the normalization factor calc-ed once
    f = one(Float64)/cdfmat[i,hl]
    for j=1:hl
      cdfmat[i,j] *= f
    end
  end
end

(a) Note the use of one , zero to prepare for change of Real type - this is good practice. (a)注意使用onezero来准备改变Real类型 - 这是一种很好的做法。 (b) Also adding various @inbounds and @simd should optimize further. (b)同时添加各种@inbounds@simd应进一步优化。 (c) Putting this code in a function is recommended (this is not done in this answer). (c)建议将此代码放在一个函数中(本答案中没有这样做)。 (d) If having a zero CDF for empty samples is OK (which means no samples means huge samples semantically), then the second for can be simplified. (d)如果使用具有零CDF为空的样品是行(这意味着没有样品意味着语义巨大样本),然后在第二for可被简化。

See other answers for more options, and reminder: Premature optimization is the root of all evil (Knuth??) 有关更多选项,请参阅其他答案,并提醒: 过早优化是所有邪恶的根源 (Knuth ??)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM