简体   繁体   中英

What is the difference between `UnitRange` and `Array`?

I have two versions of code that seem to do the same thing:

sum = 0
for x in 1:100
    sum += x
end
sum = 0
for x in collect(1:100)
    sum += x
end

Is there a practical difference between the two approaches?

UnitRange will use constant space, while an Array will grow in proportion to the number of elements you are iterating over.

You can see the difference using the BenchmarkTools package.

julia> @btime sum(1:1000000)
  0.044 ns (0 allocations: 0 bytes)
500000500000

julia> @btime sum(collect(1:1000000))
  1.123 ms (2 allocations: 7.63 MiB)
500000500000

There is a significant difference in both space and time performance between the two approaches. UnitRange avoids the work of allocating a big chunk of memory when you do not need it all at once. When possible, prefer to use UnitRange over Array .

In Julia, 1:100 returns a particular struct called UnitRange that looks like this:

julia> dump(1:100)
UnitRange{Int64}
  start: Int64 1
  stop: Int64 100

This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange , a type to represent ranges with arbitrary step, subtype of AbstractVector .

The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index] ). For example, with @less (1:100)[3] you can see this method:

function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
    @_inline_meta
    val = v.start + (i - 1)
    @boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
    val % T
end

This is returning the i -th element of the vector by adding i - 1 to the first element ( start ) of the range. Some functions have optimised methods with UnitRange , or more generally with AbstractRange . For instance, with @less sum(1:100) you can see the following

function sum(r::AbstractRange{<:Real})
    l = length(r)
    # note that a little care is required to avoid overflow in l*(l-1)/2
    return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
                                     : (step(r) * l) * ((l-1)>>1))
end

This method uses the formula for the sum of an arithmetic progression , which is extremely efficient as it's evaluated in a time independent of the size of the vector.

On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange ) is that getindex(vector::Vector, i) (or vector[i] , with vector::Vector ) doesn't do any computation but simply accesses the i -th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).

If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools ):

julia> using BenchmarkTools

julia> @btime sum($(1:1000_000))
  0.012 ns (0 allocations: 0 bytes)
500000500000

julia> @btime sum($(collect(1:1000_000)))
  229.979 μs (0 allocations: 0 bytes)
500000500000

Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex . Consider for example this function:

function test(vec)
    sum = zero(eltype(vec))
    for idx in eachindex(vec)
        sum += vec[idx]
    end
    return sum
end

Let's benchmark it with a UnitRange and a plain Vector :

julia> @btime test($(1:1000_000))
  812.673 μs (0 allocations: 0 bytes)
500000500000

julia> @btime test($(collect(1:1000_000)))
  522.828 μs (0 allocations: 0 bytes)
500000500000

In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.

Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRange s are more efficient when you can take advantage of specialised methods (like sum ) for which the operation can be performed in constant time.

As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:

julia> @btime collect($(1:1000_000));
  422.435 μs (2 allocations: 7.63 MiB)

julia> @btime test(collect($(1:1000_000)))
  882.866 μs (2 allocations: 7.63 MiB)
500000500000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM