简体   繁体   English

循环功能耗时太长

[英]loop function is taking too long

I'm trying to do a function who implements a sum of n cubes : 我正在尝试执行一个实现n个多维数据集的函数:

1^3 + 2^3 + 3^3 + ... + n^3 = sum 1 ^ 3 + 2 ^ 3 + 3 ^ 3 + ... + n ^ 3 =总和

My function should receive a sum and return a n or -1 if n doesn't exists. 如果n不存在,我的函数应该接收一个sum并返回一个n-1

Some examples: 一些例子:

(find-n 9)   ; should return 2 because 1^3 + 2^3 = 9
(find-n 100) ; should return 4 because 1^3 + 2^3 + 3^3 + 4^3 = 100
(find-n 10)  ; should return -1

After some work I made these two functions: 经过一些工作,我做了这两个功能:

; aux function
(defn exp-3 [base] (apply *' (take 3 (repeat base))))

; main function
(defn find-n [m]
  (loop [sum 0
         actual-base 0]
       (if (= sum m) 
           actual-base
           (if (> sum m)
               -1
               (recur (+' sum (exp-3 (inc actual-base))) (inc actual-base))))))

These functions are working properly but is taking too long to evaluate operations with BigNumbers , as example: 这些函数工作正常但是用BigNumbers评估操作花费的时间太长,例如:

(def sum 1025247423603083074023000250000N)
(time (find-n sum))
; => "Elapsed time: 42655.138544 msecs"
; => 45001000

I'm asking this question to raise some advices of how can I make this function faster. 我问这个问题,提出一些建议,如何让这个功能更快。

This is all about algebra, and little to do with Clojure or programming. 这完全是关于代数的,与Clojure或编程关系不大。 Since this site does not support mathematical typography, let's express it in Clojure. 由于这个网站不支持数学排版,让我们在Clojure中表达它。

Define 限定

(defn sigma [coll] (reduce + coll))

and

(defn sigma-1-to-n [f n]
  (sigma (map f (rest (range (inc n))))))

(or (要么

(defn sigma-1-to-n [f n]
  (->> n inc range rest (map f) sigma))

)

Then the question is, given n , to find i such that (= (sigma-1-to-n #(* % % %) i) n) . 那么,给定n ,问题是找到i (= (sigma-1-to-n #(* % % %) i) n)

The key to doing this quickly is Faulhaber's formula for cubes. 快速做到这一点的关键是Faulhaber的立方体公式 It tells us that the following are equal, for any natural number i : 它告诉我们以下是相同的,对于任何自然数i

(#(*' % %) (sigma-1-to-n identity i))

(sigma-1-to-n #(* % % %) i)

(#(*' % %) (/ (*' i (inc i)) 2))

So, to be the sum of cubes, the number 所以,要成为立方体的总和,数量

  • must be a perfect square 必须是一个完美的广场
  • whose square root is the sum of the first so many numbers. 其平方根是第一个这么多数字的总和。

To find out whether a whole number is a perfect square, we take its approximate floating-point square root, and see whether squaring the nearest integer recovers our whole number: 为了找出整数是否是一个完美的平方,我们采用它的近似浮点平方根,看看是否平方最接近的整数恢复了我们的整数:

(defn perfect-square-root [n]
  (let [candidate (-> n double Math/sqrt Math/round)]
    (when (= (*' candidate candidate) n)
      candidate)))

This returns nil if the argument is not a perfect square. 如果参数不是完美的正方形,则返回nil

Now that we have the square root, we have to determine whether it is the sum of a range of natural numbers: in ordinary algebra, is it (j (j + 1)) / 2 , for some natural number j . 现在我们有了平方根,我们必须确定它是否是一系列自然数的总和:在普通代数中,是(j (j + 1)) / 2 ,对于某个自然数j

We can use a similar trick to answer this question directly. 我们可以使用类似的技巧直接回答这个问题。

j (j + 1) = (j + 1/2)^2 + 1/4

So the following function returns the number of successive numbers that add up to the argument, if there is one: 因此,如果存在以下函数,则以下函数返回累加到参数的连续数字的数量:

(defn perfect-sum-of [n]
  (let [j (-> n (*' 2)
                (- 1/4)
                double
                Math/sqrt
                (- 0.5)
                Math/round)]
    (when (= (/ (*' j (inc j)) 2) n)
      j)))

We can combine these to do what you want: 我们可以将它们结合起来做你想做的事:

(defn find-n [big-i]
  {:pre [(integer? big-i) ((complement neg?) big-i)]}
  (let [sqrt (perfect-square-root big-i)]
    (and sqrt (perfect-sum-of sqrt))))

(def sum 1025247423603083074023000250000N)

(time (find-n sum))
"Elapsed time: 0.043095 msecs"
=> 45001000

(Notice that the time is about twenty times faster than before, probably because HotSpot has got to work on find-n , which has been thoroughly exercised by the appended testing) (请注意,时间比以前快了大约20倍,可能是因为HotSpot必须使用find-n ,已通过附加测试进行了彻底的练习)

This is obviously a lot faster than the original. 这显然比原来快很多。


Caveat 警告

I was concerned that the above procedure might produce false negatives (it will never produce a false positive) on account of the finite precision of floating point. 我担心由于浮点的有限精度,上述过程可能会产生漏报(它永远不会产生误报)。 However, testing suggests that the procedure is unbreakable for the sort of number the question uses. 但是,测试表明该程序对于问题使用的那种数字是不可破解的。


A Java double has 52 bits of precision, roughly 15.6 decimal places. Java double有52位精度,大约15.6位小数。 The concern is that with numbers much bigger than this, the procedure may miss the exact integer solution, as the rounding can only be as accurate as the floating point number that it starts with. 值得关注的是,如果数字大于此数,则该过程可能会错过精确的整数解,因为舍入只能与它开始的浮点数一样准确。

However, the procedure solves the example of a 31 digit integer correctly. 但是,该过程正确地解决了31位整数的示例。 And testing with many (ten million!) similar numbers produces not one failure. 并且使用许多(一千万个!)相似的数字进行测试不会产生一次失败。


To test the solution, we generate a (lazy) sequence of [limit cube-sum] pairs: 为了测试解决方案,我们生成[limit cube-sum]对的(惰性)序列:

(defn generator [limit cube-sum]
  (iterate
    (fn [[l cs]]
      (let [l (inc l)
            cs (+' cs (*' l l l))]
        [limit cs]))
    [limit cube-sum]))

For example, 例如,

(take 10 (generator 0 0))
=> ([0 0] [1 1] [2 9] [3 36] [4 100] [5 225] [6 441] [7 784] [8 1296] [9 2025])

Now we 现在我们

  • start with the given example, 从给定的例子开始,
  • try the next ten million cases and 尝试下一千万个案例和
  • remove the ones that work. 删除那些有效的。

So 所以

(remove (fn [[l cs]] (= (find-n cs) l)) (take 10000000 (generator 45001000 1025247423603083074023000250000N)))
=> () 

They all work. 他们都工作。 No failures. 没有失败。 Just to make sure our test is valid: 只是为了确保我们的测试有效:

(remove (fn [[l cs]] (= (find-n cs) l)) (take 10 (generator 45001001 1025247423603083074023000250000N)))
=>
([45001001 1025247423603083074023000250000N]
 [45001002 1025247514734170359564546262008N]
 [45001003 1025247605865263720376770289035N]
 [45001004 1025247696996363156459942337099N]
 [45001005 1025247788127468667814332412224N]
 [45001006 1025247879258580254440210520440N]
 [45001007 1025247970389697916337846667783N]
 [45001008 1025248061520821653507510860295N]
 [45001009 1025248152651951465949473104024N]
 [45001010 1025248243783087353664003405024N])

All ought to fail, and they do. 一切都应该失败,他们会这样做。

Just avoiding the apply (not really all that fast in CLJ) gives you a 4x speedup: 只是避免apply (在CLJ中并不是那么快)给你4倍的加速:

(defn exp-3 [base]
  (*' base base base))

And another 10%: 另外10%:

(defn find-n [m]
  (loop [sum 0
         actual-base 0]
    (if (>= sum m)
      (if (= sum m) actual-base -1)
      (let [nb (inc actual-base)]
        (recur (+' sum (*' nb nb nb)) nb)))))

The following algorithmic-based approach relies on one simple formula which says that the sum of the cubes of the first N natural numbers is: (N*(N+1)/2)^2 以下基于算法的方法依赖于一个简单的公式,即前N个自然数的立方总和为: (N*(N+1)/2)^2

(defn sum-of-cube
  "(n*(n+1)/2)^2"
  [n]
  (let [n' (/ (*' n (inc n)) 2)]
    (*' n' n')))

(defn find-nth-cube
  [n]
  ((fn [start end prev]
     (let [avg (bigint (/ (+' start end) 2))
           cube (sum-of-cube avg)]
       (cond (== cube n) avg
             (== cube prev) -1
             (> cube n) (recur start avg cube)
             (< cube n) (recur avg end cube))))
    1 n -1))

(time (find-nth-cube 1025247423603083074023000250000N))
"Elapsed time: 0.355177 msecs"
=> 45001000N

We want to find the number N such that the sum of 1..N cubes is some number X. To find if such a number exists, we can perform a binary search over some range for it by applying the above formula to see whether the result of the formula equals X. This approach works because the function at the top is increasing, and thus any value f(n) which is too large means that we must look for a lower number n , and any value f(n) which is too small means that we must look for a larger number n . 我们想要找到数字N,使得1..N立方体的总和是某个数字X.要查找是否存在这样的数字,我们可以通过应用上面的公式来查看是否存在某个范围的二元搜索,以查看是否公式的结果等于X.这种方法有效,因为顶部的函数正在增加,因此任何值太大的值f(n)意味着我们必须寻找一个较小的数字n ,以及任何值f(n)太小意味着我们必须寻找更大数量的n

We choose a (larger than necessary, but easy and safe) range of 0 to X. We will know that the number exists if our formula applied to a given candidate number yields X. If it does not, we continue the binary search until either we find the number, or until we have tried the same number twice, which indicates that the number does not exist. 我们选择一个(大于必要但容易且安全的)0到X的范围。如果我们的公式应用于给定的候选数字得到X,我们将知道该数字存在。如果不存在,我们继续二进制搜索,直到我们找到了数字,或直到我们尝试了两次相同的数字,这表明该数字不存在。

With an upper bound of logN , only takes 1 millisecond to compute 1E100 (1 googol), so it's very efficient for an algorithmic approach. 使用logN的上限,只需要1毫秒来计算1E100(1 googol),因此它对于算法方法非常有效。

You may want to use some mathematical tricks. 你可能想要使用一些数学技巧。

(a-k)^3 + (a+k)^3 = 2a^3+(6k^2)a

So, a sum like: 所以,总和如:

(a-4)^3+(a-3)^3+(a-2)^3+(a-1)^3+a^3+(a+1)^3+(a+2)^3+(a+3)^3+(a+4)^3 
= 9a^3+180a

(please confirm correctness of the calculation). (请确认计算的正确性)。

Using this equation, instead of incrementing by 1 every time, you can jump by 9 (or by any 2 k +1 you like). 使用这个等式,你可以每次增加1,而不是每次增加1(或者你想要的任何2 k +1)。 You can check for the exact number whenever you hit a bigger number than n . 每当你的数字大于n时,你可以检查确切的数字。

Other way to improve is to have a table of n s and sum s, by making a batch of computations once and use this table later in function find-n. 改进的另一种方法是通过进行一次计算一次得到n s和sum的表,并在函数find-n中稍后使用该表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM