[英]Why are some signed integer faster than unsigned integer types in Julia?
我一直在研究抽象類型,主要是有符號 integer 類型和無符號 integer 類型之間的區別。 我真的很好奇Julia
的性能,所以我測量了這些類型之間的差異。 由於Julia主要在 C 中實現,因此我假設抽象類型中的行為幾乎相同。 這個問題: Performance of unsigned vs signed integers給出了一些關於無符號整數導致與有符號整數相同或更好的性能的解釋。 但在我的基准測試中,我發現一些整數比無符號整數更快。 這是一些可重現的代碼:
julia> using BenchmarkTools
julia> Int_8 = rand(Int8, 1000)
julia> Int_16 = rand(Int16, 1000)
julia> Int_32 = rand(Int32, 1000)
julia> Int_64 = rand(Int64, 1000)
julia> Int_128 = rand(Int128, 1000)
julia> UInt_8 = rand(UInt8, 1000)
julia> UInt_16 = rand(UInt16, 1000)
julia> UInt_32 = rand(UInt32, 1000)
julia> UInt_64 = rand(UInt64, 1000)
julia> UInt_128 = rand(UInt128, 1000)
julia> @benchmark Int_8.^2
BenchmarkTools.Trial: 10000 samples with 147 evaluations.
Range (min … max): 698.333 ns … 54.836 μs ┊ GC (min … max): 0.00% … 98.34%
Time (median): 733.088 ns ┊ GC (median): 0.00%
Time (mean ± σ): 870.727 ns ± 2.209 μs ┊ GC (mean ± σ): 11.56% ± 4.49%
▇█▅▄▂▁ ▇▇▄▂▁ ▂▁ ▁ ▁▁ ▂
██████████████▇▆▆▆██████▇██████▇▆▅▆▆▄▄▅▆▆▆▄▄▅▆▆▆▆▆▅▅▄▅▄▄▄▅▃▄ █
698 ns Histogram: log(frequency) by time 1.28 μs <
Memory estimate: 1.14 KiB, allocs estimate: 5.
julia> @benchmark Int_16.^2
BenchmarkTools.Trial: 10000 samples with 16 evaluations.
Range (min … max): 986.625 ns … 3.895 ms ┊ GC (min … max): 0.00% … 99.93%
Time (median): 1.077 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.069 μs ± 77.347 μs ┊ GC (mean ± σ): 50.40% ± 2.00%
██▃ ▂▅▆▆▅▄▄▃▂▂▁ ▂
████▆▇▆▅▅▅▆▅▅▃▆▆▄████████████▇▅▅▅▅▄▄▆▇▆▆▇▇▇█▇▇▆▆▅▆▄▅▅▄▅▅▆▄▅▅ █
987 ns Histogram: log(frequency) by time 3.68 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
julia> @benchmark Int_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.327 μs … 4.701 ms ┊ GC (min … max): 0.00% … 99.89%
Time (median): 1.474 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.481 μs ± 80.410 μs ┊ GC (mean ± σ): 39.99% ± 1.73%
▆█▆▃ ▁▄▄▄▄▃▃▂▁▁ ▁
█████▇▆▆▆▆▅▄▅▅▅▄▄▃▅███████████▇▇▅▅▅▆▆▆▄▅▅▆▅▆▄▆▅▅▇▆▅▅▅▆▅▅▆▆ █
1.33 μs Histogram: log(frequency) by time 5.82 μs <
Memory estimate: 4.14 KiB, allocs estimate: 5.
julia> @benchmark Int_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.749 μs … 2.949 ms ┊ GC (min … max): 0.00% … 99.73%
Time (median): 1.887 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.069 μs ± 84.920 μs ┊ GC (mean ± σ): 52.89% ± 3.15%
▇█▆▃▁ ▂▃▃▃▂▂▂▁▁ ▂
█████▇▇▇█▇▆▅▅▅▅▅▄▅▄▄▅▁▃▅▃▅▅███████████▇▇▆▅▃▅▄▁▄▅▄▅▆▅▇▆▅▅▆▆ █
1.75 μs Histogram: log(frequency) by time 7.31 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark Int_128.^2
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 8.516 μs … 5.690 ms ┊ GC (min … max): 0.00% … 99.72%
Time (median): 8.778 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.962 μs ± 96.809 μs ┊ GC (mean ± σ): 15.28% ± 1.73%
▄█▅▃▁ ▁
██████▇▇▆▆▆▆▇▆▆▆▆▅▄▃▃▄▃▂▃▄▄▄▄▃▄▄▅▆▆▆▆▆▆▆▄▅▄▄▃▃▄▆▆▆▆▆▅▄▄▅▆▆▆ █
8.52 μs Histogram: log(frequency) by time 19.2 μs <
Memory estimate: 15.83 KiB, allocs estimate: 5.
julia> @benchmark UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 104 evaluations.
Range (min … max): 845.058 ns … 76.695 μs ┊ GC (min … max): 0.00% … 98.21%
Time (median): 897.909 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.027 μs ± 2.643 μs ┊ GC (mean ± σ): 9.58% ± 3.68%
██▇▆▃▆█▇▅▄▃▂▁ ▁▁▁▁▁▂▁▁ ▁ ▃
███████████████▇▇▆▇█████████████▇▆▆▆▆▅▅▆▆▆▆▆▆▅▆▆▆▅▅▆▅▅▄▅▄▅▁▄ █
845 ns Histogram: log(frequency) by time 1.59 μs <
Memory estimate: 1.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_16.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.032 μs … 6.110 ms ┊ GC (min … max): 0.00% … 99.96%
Time (median): 1.139 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.458 μs ± 83.275 μs ┊ GC (mean ± σ): 47.86% ± 1.41%
██▅▂▁ ▁▂▃▂▂▁▁ ▂
███████▆▆▆▆▆▄▇▆▆▇███████████▆▄▅▆▆▄▄▄▃▃▃▄▃▄▃▅▁▃▄▅▃▃▄▄▅▄▅▆▆▆ █
1.03 μs Histogram: log(frequency) by time 3.83 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.205 μs … 4.694 ms ┊ GC (min … max): 0.00% … 99.91%
Time (median): 1.332 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.821 μs ± 101.562 μs ┊ GC (mean ± σ): 59.39% ± 2.23%
▇█▆▃ ▁▂▂▂▂▂▁ ▂
▇██████▆▇▇▆▆▆▇▇▅▅▄▄▁▃▃▄▄▃▄▇█████████▇▇▇▆▇▅▅▅▄▁▄▄▄▆▇▆▆▆▄▅▆▆▆ █
1.2 μs Histogram: log(frequency) by time 4.42 μs <
Memory estimate: 4.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.695 μs … 2.845 ms ┊ GC (min … max): 0.00% … 99.81%
Time (median): 1.842 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.734 μs ± 84.073 μs ┊ GC (mean ± σ): 56.11% ± 3.16%
▅██▆▃▁ ▁▁▁▁ ▂
███████▇▆▆▇▇▆▆▄▄▅▄▃▄▅▆▆▅▅▁▃▃▃▄▄▄▁▅▃▅▅▇█████████▇▇▇▆▆▄▆▆▄▄▃ █
1.7 μs Histogram: log(frequency) by time 5.71 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
Range (min … max): 8.474 μs … 4.597 ms ┊ GC (min … max): 0.00% … 99.69%
Time (median): 8.691 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.980 μs ± 87.765 μs ┊ GC (mean ± σ): 15.96% ± 1.99%
▇█▆▄▃▁▁▁ ▁
█████████████▇█▇▇▇▆▆▆▅▅▄▆▄▄▄▄▅▄▄▂▄▅▄▄▄▅▆▇▇▇▆▇▇█▇▇▇▆▆▆▆▆▇▅▅▄ █
8.47 μs Histogram: log(frequency) by time 17 μs <
Memory estimate: 15.83 KiB, allocs estimate: 5.
julia> @btime $Int_8.^2
91.008 ns (1 allocation: 1.06 KiB)
julia> @btime $Int_16.^2
319.261 ns (1 allocation: 2.06 KiB)
julia> @btime $Int_32.^2
535.193 ns (1 allocation: 4.06 KiB)
julia> @btime $Int_64.^2
923.778 ns (1 allocation: 7.94 KiB)
julia> @btime $Int_128.^2
7.706 μs (1 allocation: 15.75 KiB)
julia> @btime $UInt_8.^2
89.870 ns (1 allocation: 1.06 KiB)
julia> @btime $UInt_16.^2
316.495 ns (1 allocation: 2.06 KiB)
julia> @btime $UInt_32.^2
539.642 ns (1 allocation: 4.06 KiB)
julia> @btime $UInt_64.^2
872.708 ns (1 allocation: 7.94 KiB)
julia> @btime $UInt_128.^2
7.061 μs (1 allocation: 15.75 KiB)
可以看到,UInt8、UInt16 比 Int8、Int16 稍快一些。 但是 Int32 變得比 UInt32 更快。 這是有原因的嗎? Int64 和 Int128 再次變慢。 正如您所看到的,性能在無符號整數與有符號整數和交叉之間的某個點發生變化。
所以我想知道是否有人可以解釋為什么一些有符號整數比無符號整數快,反之亦然? 不應該有更多的線性差異,因為現在它跨越了類型內的性能嗎?
編輯:添加評論中提到的@Shayan之類的基准
在這里我添加了基准。 似乎無符號整數的 UInt8 和 UInt32 更快,Int64 和 Int128 整數更快,所以這些抽象類型之間仍然存在一些差異。
julia> @benchmark $Int_8.^2
BenchmarkTools.Trial: 10000 samples with 958 evaluations.
Range (min … max): 81.209 ns … 9.620 μs ┊ GC (min … max): 0.00% … 97.35%
Time (median): 115.392 ns ┊ GC (median): 0.00%
Time (mean ± σ): 221.416 ns ± 855.670 ns ┊ GC (mean ± σ): 43.50% ± 10.98%
█▃ ▁
██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▄▃▄▅▆ █
81.2 ns Histogram: log(frequency) by time 7.39 μs <
Memory estimate: 1.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_16.^2
BenchmarkTools.Trial: 10000 samples with 241 evaluations.
Range (min … max): 312.037 ns … 259.117 μs ┊ GC (min … max): 0.00% … 99.79%
Time (median): 363.174 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.899 μs ± 18.716 μs ┊ GC (mean ± σ): 77.72% ± 7.83%
▄▆██▃ ▂▂▁ ▁ ▂
█████████████▇▆▆▆█▇▇▆▆▅▆▆▅▄▅▆▅▅▄▄▅▄▃▄▄▄▄▆▅▆▆▆▇▇██████▇▇▇▇▆▅▅▄ █
312 ns Histogram: log(frequency) by time 1.35 μs <
Memory estimate: 2.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_32.^2
BenchmarkTools.Trial: 9034 samples with 194 evaluations.
Range (min … max): 520.572 ns … 251.764 μs ┊ GC (min … max): 0.00% … 99.71%
Time (median): 583.026 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.824 μs ± 22.005 μs ┊ GC (mean ± σ): 77.60% ± 9.84%
▇█▄▁▂▃▁ ▁▁ ▁
████████████▇▆▇▆▅▃▅▄▅▄▃▄▄▄▄▄▁▄▅▆▆▆▆▇▇▇▆▆▅▄▅▅▅▃▄▄▁▁▃▁▄▁▁▁▁▁▁▁▃ █
521 ns Histogram: log(frequency) by time 3.06 μs <
Memory estimate: 4.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_64.^2
BenchmarkTools.Trial: 10000 samples with 44 evaluations.
Range (min … max): 867.795 ns … 653.472 μs ┊ GC (min … max): 0.00% … 99.82%
Time (median): 1.009 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.341 μs ± 36.073 μs ┊ GC (mean ± σ): 68.34% ± 6.30%
▃▆▇██▆▆▄▂▁ ▁ ▂
██████████▇▆▆▇▇▆▇▇▆▅▅▇▇█████▇▇▅▆▄▅▄▃▄▄▃▃▄▁▅▆▇▅▄▅▄▄▅▅▁▄▅▃▃▃▄▄▃ █
868 ns Histogram: log(frequency) by time 2.94 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
julia> @benchmark $Int_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 6.683 μs … 3.352 ms ┊ GC (min … max): 0.00% … 99.74%
Time (median): 7.721 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.562 μs ± 96.224 μs ┊ GC (mean ± σ): 27.31% ± 2.99%
▇▂▅▅ █▅ ▁
████▇▄▆███▇▅▆▆▅▅▆▃▄▂▄▅▄▄▄▄▄▄▄▅▅▃▄▃▄▄▂▄▂▃▂▄▃▃▆▆▆▆▄▄▆▆▇▇▆▆▆▅▇ █
6.68 μs Histogram: log(frequency) by time 14.9 μs <
Memory estimate: 15.75 KiB, allocs estimate: 1.
julia> @benchmark $UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 967 evaluations.
Range (min … max): 80.428 ns … 10.368 μs ┊ GC (min … max): 0.00% … 98.18%
Time (median): 108.816 ns ┊ GC (median): 0.00%
Time (mean ± σ): 217.285 ns ± 835.138 ns ┊ GC (mean ± σ): 43.71% ± 11.08%
█▃ ▁
██▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▄▅▅▅▅ █
80.4 ns Histogram: log(frequency) by time 7.23 μs <
Memory estimate: 1.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_16.^2
BenchmarkTools.Trial: 8270 samples with 320 evaluations.
Range (min … max): 315.113 ns … 202.421 μs ┊ GC (min … max): 0.00% … 99.80%
Time (median): 361.734 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.899 μs ± 16.395 μs ┊ GC (mean ± σ): 79.09% ± 9.07%
▅▆█▆▂▃▁▂▂▁ ▁
██████████▇▆▆███▇▆▆▆▆▅▅▅▅▅▄▁▅▄▄▄▅▃▄▃▃▄▄▅▆▅▆▆▇▇▇▇▇▇▇█▆▇▆▆▄▆▅▄▆ █
315 ns Histogram: log(frequency) by time 1.35 μs <
Memory estimate: 2.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_32.^2
BenchmarkTools.Trial: 9148 samples with 194 evaluations.
Range (min … max): 525.278 ns … 236.308 μs ┊ GC (min … max): 0.00% … 99.74%
Time (median): 582.335 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.782 μs ± 21.821 μs ┊ GC (mean ± σ): 78.12% ± 9.84%
▇█▄▂▂▃▁ ▁ ▁
███████▇████▇▆▆▅▆▆▄▄▄▄▄▄▃▃▃▄▃▃▄▅▅▆▅▅▅▆▆▅▆▅▁▃▄▁▁▁▁▁▁▄▁▁▃▁▁▁▁▁▃ █
525 ns Histogram: log(frequency) by time 2.92 μs <
Memory estimate: 4.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 45 evaluations.
Range (min … max): 859.467 ns … 592.092 μs ┊ GC (min … max): 0.00% … 99.80%
Time (median): 988.933 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.448 μs ± 36.452 μs ┊ GC (mean ± σ): 70.21% ± 6.60%
▁▃▄█▃
▃▅█████▇▅█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂ ▃
859 ns Histogram: frequency by time 2.3 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
julia> @benchmark $UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 7.691 μs … 4.544 ms ┊ GC (min … max): 0.00% … 99.77%
Time (median): 7.794 μs ┊ GC (median): 0.00%
Time (mean ± σ): 11.989 μs ± 115.585 μs ┊ GC (mean ± σ): 27.18% ± 2.82%
█▄▃▃▂▂▁ ▁ ▁
█████████████▇▇▇▇▇▆▇▆▇▆▇██▇▇██▇▆▇▅▆▅▅▅▆▅▅▅▅▅▄▅▅▃▄▅▅▅▅▅▅▅▄▅▄▅ █
7.69 μs Histogram: log(frequency) by time 20.6 μs <
Memory estimate: 15.75 KiB, allocs estimate: 1.
這不是與類型相關的問題,有符號數與無符號數的實現與語言無關,而是與目標體系結構相關的編譯器實現問題。
假設(情況並非如此,但同樣適用)您的 CPU 不支持有符號數(或無符號數),因為 C 標准規定必須實現有符號數和無符號數,其中之一將直接映射到架構類型處理他們和其他人必須在軟件中進行模擬。 問題與我們在 cpus 中沒有浮點支持時一樣。 如果你不能做浮點數學,你就不能住在 C(好吧,你可能可以,但我不能)在這種情況下,編譯器(或操作系統)安裝陷阱處理程序來攔截協處理器指令的執行在軟件中處理它們。 您將看到您的基准測試是如何以一種非簡單的方式下降的。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.