简体   繁体   中英

How does std::vector::end() iterator work in memory?

Today, I was attempting to extract a subset of N elements from a vector of size M, where N < M. I realised that I did not need to create a new copy, only needed to modify the original, and moreover, could take simply the first N elements.

After doing a few brief searches, there were many answers, the most attractive one being resize() which appears to truncate the vector down to length, and deal neatly with the memory issues of erasing the other elements.

However, before I came across vector.resize(), I was trying to point the vector.end() to the N+1'th position. I knew this wouldn't work, but I wanted to try it regardless. This would leave the other elements past the N'th position "stranded", and I believe (correct me if i'm wrong) this would be an example of a memory leak.

On looking at the iterator validity on http://www.cplusplus.com/reference/vector/vector/resize/ , we see that if it shrinks, vector.end() stays the same. If it expands, vector.end() will move (albeit irrelevant to our case).

This leads me to question, what is the underlying mechanic of vector.end()? Where does it lie in memory? It can be found incrementing an iterator pointing to the last element in the vector, eg auto iter = &vector.back(), iter++, but in memory, is this what happens?

I can believe that at all times, what follows vector.begin() should be the first element, but on resize, it appears that vector.end() can lie elsewhere other than past the last element in the vector.

For some reason, I can't seem to find the answer, but it sounds like a very basic computer science course would contain this information. I suppose it is stl specific, as there are probably many implementations of a vector / list that all differ...

Sorry for the long post about a simple question!

you asked about "the underlying mechanic of vector.end()". Well here is (a snippet of) an oversimplified vector that is easy to digest:

template <class T>
class Simplified_vector
{
public:
    using interator = T*;
    using const_interator = const T*;

private:
   T* buffer_;
   std::size_t size_;
   std::size_t capacity_;

public:

   auto push_back(const T& val) -> void
   {
       if (size_ + 1 > capacity_)
       {
           // buffer increase logic
           //
           // this usually means allocation a new larger buffer
           // followed by coping/moving elements from the old to the new buffer
           // deleting the old buffer
           // and make `buffer_` point to the new buffer
           // (along with modifying `capacity_` to reflect the new buffer size)
           //
           // strong exception guarantee makes things a bit more complicated,
           // but this is the gist of it
       }

       buffer_[size_] = val;
       ++size_;
   }

   auto begin() const -> const_iterator
   {
       return buffer_;
   }

   auto begin() -> iterator
   {
       return buffer_;
   }

   auto end() const -> const_iterator
   {
       return buffer_ + size_;
   }

   auto end() -> iterator
   {
       return  buffer_ + size_;
   }
};

Also see this question Can std::vector<T>::iterator simply be T*? for why T* is a perfectly valid iterator for std::vector<T>


Now with this implementation in mind let's answer a few of your misconceptions questions:

I was trying to point the vector.end() to the N+1'th position.

This is not possible. The end iterator is not something that is stored directly in the class. As you can see it's a computation of the begging of the buffer plus the size (number of elements) of the container. Moreover you cannot directly manipulate it. The internal workings of the class make sure end() will return an iterator pointing to 1 past the last element in the buffer. You cannot change this. What you can do is insert/remove elements from the container and the end() will reflect these new changes, but you cannot manipulate it directly.

and I believe (correct me if i'm wrong) this would be an example of a memory leak.

you are wrong. Even if you somehow make end point to something else that what is supposed to point, that wouldn't be a memory leak. A memory leak would be if you would lost any reference to the dynamically allocated internal buffer.

The "end" of any contiguous container (like a vector or an array) is always one element beyond the last element of the container.

So for an array (or vector) of X elements the "end" is index X (remember that since indexes are zero-based the last index is X - 1).

This is very well illustrated in eg this vector::end reference .

If you shrink your vector, the last index will of course also change, meaning that the "end" will change as well. If the end-iterator does not change, then it means you have saved it from before you shrank the vector, which will change the size and invalidate all iterators beyond the last element in the vector, including the end iterator.

If you change the size of a vector, by adding new elements or by removing elements, then you must re-fetch the end iterator. The existing iterator objects you have will not automatically be updated.

Usually the end isn't stored in an implementation of vector. A vector stores:

  1. A pointer to the first element. If you call begin(), this is what you get back.
  2. The size of the memory block that's been managed. If you call capacity() you get back the number of elements that can fit in this allocated memory.
  3. The number of elements that are in use. These are elements that have been constructed and are in the first part of the memory block. The rest of the memory is unused, but is available for new elements. If the entire capacity gets filled, to add more elements the vector will allocate a larger block of memory and copy all the elements into that, and deallocate the original block.

When you call end() this returns begin() + size(). So yes, end() is a pointer that points to one beyond the last element.

So the end() isn't a thing that you can move. You can only change it by adding or removing elements.

If you want to extract a number of elements 'N' you can do so by reading those from begin() to begin() + 'N'.

for( var it = vec.begin(); it != begin() + n; ++it )
{
    // do something with the element (*it) here.
}

Many stl algorithms take a pair of iterators for the begin and end of a range of elements you want to work with. In your case, you can use vec.begin() and vec.begin() + n as the begin and end of the range you're interested in.

If you want to throw away the elements after n, you can do vec.resize(n). Then the vector will destruct elements you don't need. It might not change the size of the memory block the vector manages, the vector might keep the memory around in case you add more elements again. That's an implementation detail of the vector class you're using.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM