簡體   English   中英

Julia的Prime Iterator

[英]Prime Iterator in Julia

是否有(高效)迭代器在Julia中生成素數? 內置函數primes[N]生成所有素數,而不是根據需要生成N ,並且當N非常大或未知時可能無法使用。

您可以使用概率素性測試過濾通過(大)整數( Base.Count{BigInt}迭代器)的計數器

iterprimes = filter(isprime,countfrom(big(2),1))

然后例如

julia> collect(take(iterprimes, 5))
5-element Array{Any,1}:
  2
  3
  5
  7
 11

這不像篩子那樣有效,但在記憶中不能保持龐大的結構。 我記得isprime至少有2到64的誤報,標准的重復選擇。

編輯:

第二種可能性是生成(參見Generator )塊的primes(N*(i-1)+1,N*i)並將它們Base.flatten到一個列表中:

Base.flatten(primes(1000000*(i-1)+1,1000000*i) for i in countfrom(1))

在這台機器上,這個迭代器實際上勝過普通primes用於計算前10 ^ 9個素數。

編輯2:

使用gmpznextprime的迭代器。

type 
   PrimeIter
end
function nextprime(y::BigInt)
    x = BigInt()
    ccall((:__gmpz_nextprime,:libgmp), Void, (Ptr{BigInt},Ptr{BigInt}), &x, &y)
    x
end
Base.start(::PrimeIter) = big(2)
Base.next(::PrimeIter, state) = state, nextprime(state)
Base.done(::PrimeIter, _) = false
Base.iteratorsize(::PrimeIter) = Base.IsInfinite()


> first(drop(PrimeIter(), 10^5))
1299721

您可以查看Lazy.jl ,它可以根據需要為您提供主要的迭代。 它適用於未知的大數字。 假設您希望使用小於上限的所有素數,並且有空間來存儲它們。

引用他們的自述文件: -

# isprime defined in terms of the prime numbers:
isprime(n) =
  @>> primes begin
    takewhile(x -> x<=sqrt(n))
    map(x -> n % x == 0)
    any; !
  end

# the prime numbers defined in terms of isprime:
primes = filter(isprime, range(2));

take(20, primes)
#> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)

為了解釋這個代碼,首先使用所有素primesprimes列表定義isprime函數(在那個時間點尚未定義),通過取所有素數小於n平方根,檢查它們中是否有任何除數n ,並在邏輯上否定結果。

然后, prime被定義為從2開始的所有整數上的isprime filter

要獲得低於n所有素數,你可以運行@>> primes takewhile(p -> p <= n)而不是take

一種節省存儲但會給你一些非素數的替代方案是使用一個輪子,參見Wheel Factorization 所需要的只是存儲找到的最后一個號碼並轉到方向盤上的下一個號碼。

例如,單獨處理2和3。 然后從5添加2和4交替:5,7,11,13,15 ......得到的數字流消除了2和3的所有倍數。還有更復雜的輪子也將消除5或更高質數的倍數。

這種方法花費一些時間除以非素數,但會節省所需的存儲空間。 隨着素數變得越來越少,所有車輪的數量都越來越低。 您將了解系統的時間和存儲限制。

您沒有說出您認為迭代器的合理范圍或您想要處理它的時間長度。 這樣的算法通常有兩種形式,如:A)短但較慢(一秒鍾的范圍為一秒),B)更復雜但更快(比A快約100倍)。 以下是每個例子。

A)基於內置“Primes”包(pkg add“Primes”)的迭代器版本,特別是nextprime函數:

using Primes: nextprime

mutable struct PrimesGen
    lastprime :: UInt64
    PrimesGen() = new()
end
Base.eltype(::Type{PrimesGen}) = Int64
Base.IteratorSize(::PrimesGen) = Base.IsInfinite()
function Base.iterate(PG::PrimesGen, st::UInt64 = UInt64(1)) # :: Union{Nothing,Tuple{UInt64,UInt64}}
    next = nextprime(st + 1)
    next, next
end

EDIT_ADD:由於他的多個嵌套迭代器,上面的代碼比@mschauer的第一個解決方案(更新到當前的Julia版本1.0)稍快一些,如下所示:

using Primes: isprime
PrimesGen() = Iterators.filter(isprime, Iterators.countfrom(UInt64(2)))

但它很短,可以用同樣的方式... END_EDIT_ADD

您可以使用它來執行以下操作:

using Printf
@time let sm = 0
          for p in PrimesGen() p >= 2_000_000 && break; sm += p end
          Printf.@printf("%d\n",sm)
      end

產生以下內容:

142913828922
  0.651754 seconds (327.05 k allocations: 4.990 MiB)

這足以用於這些較小的范圍,例如解決上面的歐拉問題10(在1.92千兆赫茲的英特爾x5-Z8350上運行)。

上面的迭代器實際上有一個“無限”范圍的UInt64數字,但超過30萬年不會達到,所以我們真的不需要擔心它...

B)對於涉及十億或更多范圍的“工業強度”問題,需要一個迭代器(或直接函數調用)Eratosthenes頁面分段Sieve的實現,速度快一百倍,實現如下:

const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}

# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
    thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
    value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
    BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
    if state.thunk !== nothing
        newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
            state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
                                                 , BasePrimesArrays}}
        state.value = newvalue
        state.thunk = nothing
        return newvalue
    end
    state.value
end

# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
    foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end

# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
    limiti = length(cmpsts) * 8
    len :: Int = countComposites(cmpsts)
    rslt :: BasePrimesArray = BasePrimesArray(undef, len)
    i :: Int = 0
    j :: Int = 1
    @inbounds(
    while i < limiti
        if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
            rslt[j] = low + i + i
            j += 1
        end
        i += 1
    end)
    rslt
end

# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
                                     bpas::BasePrimesArrays)
    lowi :: Int = (low - 3) ÷ 2
    len :: Int = length(buffer)
    limiti :: Int = len * 8 - 1
    nexti :: Int = lowi + limiti
    for bpa::BasePrimesArray in bpas
        for bp::BasePrime in bpa
            bpint :: Int = bp
            bpi :: Int = (bpint - 3) >>> 1
            starti :: Int = 2 * bpi * (bpi + 3) + 3
            starti >= nexti && return
            if starti >= lowi starti -= lowi
            else
                r :: Int = (lowi - starti) % bpint
                starti = r == 0 ? 0 : bpint - r
            end
            lmti :: Int = limiti - 40 * bpint
            @inbounds(
            if bpint <= (len >>> 2) starti <= lmti
                for i in 1:8
                    if starti > limiti break end
                    mask = convert(UInt8,1) << (starti & 7)
                    c = starti >>> 3 + 1
                    while c <= len
                        buffer[c] |= mask
                        c += bpint
                    end
                    starti += bpint
                end
            else
                c = starti
                while c <= limiti
                    buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
                    c += bpint
                end
            end)
        end
    end
    return
end

# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
function makeBasePrimesArrays() :: BasePrimesArrays
    cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
    function nextelem(low::Prime, bpas::BasePrimesArrays) ::
                                    Tuple{BasePrimesArray, BasePrimesArrays}
        # calculate size so that the bit span is at least as big as the
        # maximum culling prime required, rounded up to minsizebits blocks...
        reqdsize :: Int = 2 + isqrt(1 + low)
        size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
        if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
        fill!(cmpsts, 0)
        sieveComposites(low, cmpsts, bpas)
        arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
        next :: Prime = low + length(cmpsts) * 8 * 2
        arr, BasePrimesArrays(() -> nextelem(next, bpas))
    end
    # pre-seeding breaks recursive race,
    # as only known base primes used for first page...
    preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
        [ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
        ]
    nextfunc :: Function = () ->
        (nextelem(convert(Prime,101), makeBasePrimesArrays()))
    firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
    BasePrimesArrays(firstfunc)
end

# an iterator over successive sieved buffer composite arrays,
# returning a tuple of the value represented by the lowest possible prime
# in the sieved composites array and the array itself;
# array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
    baseprimes :: BasePrimesArrays
    PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.IsInfinite()
function Base.iterate(PP::PrimesPages,
                      state :: Tuple{Prime,SieveBuffer} =
                            ( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
    (low, cmpsts) = state
    # calculate size so that the bit span is at least as big as the
    # maximum culling prime required, rounded up to minsizebits blocks...
    reqdsize :: Int = 2 + isqrt(1 + low)
    size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
    if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
    fill!(cmpsts, 0)
    sieveComposites(low, cmpsts, PP.baseprimes)
    newlow :: Prime = low + length(cmpsts) * 8 * 2
    ( low, cmpsts ), ( newlow, cmpsts )
end

function countPrimesTo(range::Prime) :: Int64
    range < 3 && ((range < 2 && return 0) || return 1)
    count :: Int64 = 1
    for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
        if low + length(cmpsts) * 8 * 2 > range
            lasti :: Int = (range - low) ÷ 2
            count += countComposites(@view cmpsts[1:lasti >>> 3])
            count += count_zeros(cmpsts[lasti >>> 3 + 1] |
                                 (0xFE << (lasti & 7)))
            return count
        end
        count += countComposites(cmpsts)
    end
    count
end

可以這樣調用:

using Printf
@time let sm = 0
          for p in PrimesPaged() p >= 2_000_000 && break; sm += p end
          Printf.@printf("%d\n",sm)
      end

產生以下內容:

142913828922
  0.016245 seconds (60 allocations: 23.891 KiB)

但幾乎不足以“溫暖”; 它可以使用參數調用一百倍的大小來生成以下內容:

1075207199997334
  1.381198 seconds (2.35 k allocations: 103.875 KiB)

並且可以使用以下代碼將所有素數計算為十億:

println(@time let count = 0
                  for p in PrimesPaged()
                      p > 1_000_000_000 && break
                      count += 1
                  end; count end)

產生以下內容:

6.802044 seconds (11.51 k allocations: 396.734 KiB)
50847534

然而,迭代篩選的素數比首先篩選它們需要更長的時間。 這可以通過調用提供的優化計數函數來顯示,以消除大部分枚舉時間,如下所示:

println(@time countPrimesTo(Prime(1_000_000_000)))

產生以下內容:

1.959057 seconds (65 allocations: 39.266 KiB)
50847534

以類似的方式,人們可以寫一個sumPrimesTo函數,在更短的時間內(例如兩次)將篩分的素數加到十億......

此處的所有測試均在1.92 Gigahertz的相同x5-Z8350 CPU上運行。

這表明,對於真正巨大的問題,不應該使用迭代器,而應該使用直接在被剔除的頁面段上操作的自定義函數,就像countPrimesTo函數在這里countPrimesTo那樣。 完成此操作后,值得進一步優化,例如最大車輪分解(在篩分速度中再增加四倍)和多線程(增加所使用的有效CPU固化次數)包括超線程的那些,最終結果並不比Kim Walich的“primesieve”慢得多。

同樣,這具有相同的UInt64 “無限”限制,但它仍然需要數百年才能到達那里,所以我們仍然不必擔心它。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM