简体   繁体   中英

Removing duplicates from an vector is giving unexpected results

Given a vector of integers, iterate through the vector and check whether there are more than one of the same number. In that case, remove them so that only a single index of the vector contains that number. Here are a few examples:

 vector<int> arr {1,1,1,1} When arr is printed out the result should be 1. vector<int> arr {1,2,1,2} When arr is printed out the result should be 1,2. vector<int> arr {1,3,2} When arr is printed out the result should be 1,3,2.

I know there are many solutions regarding this, but I want to solve it using my method. The solutions I've looked at use a lot of built-in functions, which I don't want to get too comfortable with as a beginner. I want to practice my problem-solving skills.

This is my code:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
    vector<int> arr {1,1,1,1,2,1,1,1,1,1};

    for (int i {}; i < arr.size(); ++i)
    {
        int counter {};
        
        for (int j {}; j < arr.size(); ++j)
        {
            if (arr.at(i) == arr.at(j))
            {
                counter++;
                
                if (counter > 1)
                    arr.erase(arr.begin()+j);
            }
        }
    }
    
    //Prints out the vector arr
    
    for (auto value : arr)
    {
        cout << value << endl;
    }

    return 0;
}

The thing is that it works for the most part, except a few cases which have me confused.

For instance:

vector<int> arr {1,1,1,1,2,1,1,1,1,1}
When arr is printed out the result is 1,2,1 instead of 1,2.

However, in this case:

vector<int> arr {1,1,1,1,2,1,1,1}
When arr is printed out the result is 1,2.

It seems to work in the vast majority of cases, but when a number repeats itself a lot of times in the vector, it seems to not work, and I can't seem to find a reason for this.

I am now asking you to firstly tell me the cause of the problem, and then to give me guidance on how I should tackle this problem using my solution.

The machine I'm using has a pre C++11 compiler so this is an answer in the old fashioned C++. The easy way around this is to erase backwards. That way you don't have to worry about the size. Also, instead of using a for loop, which may be optimised, use a while loop.

#include <iostream>
#include <vector>

int main()
{
    int dummy[] = {1,1,1,1,2,1,1,1,1,1};
    std::vector<int> arr(dummy, dummy + sizeof(dummy)/sizeof(dummy[0]));

    size_t ii = 0;
    while (ii < arr.size())
    {
        // Save the value for a little efficiency
        int value = arr[ii];

        // Go through backwards only as far as ii.
        for (size_t jj = arr.size() - 1; jj > ii; --jj)
        {
            if (value == arr[jj])
                arr.erase(arr.begin() + jj);
        }
        ++ii;
    }
    
    
//Prints out the vector arr
    
    for (size_t ii = 0; ii < arr.size(); ++ii)
    {
        std::cout << arr[ii] << std::endl;
    }

    return 0;
}

As mentioned in the comments, when you erase a found duplicate (at index j ), you are potentially modifying the position of the element at index i .

So, after you have called arr.erase(arr.begin() + j) , you need to adjust i accordingly, if it was referring to an element that occurs after the removed element.

Here's a "quick fix" for your function:

#include <iostream>
#include <vector>

int main()
{
    std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };
    for (size_t i{}; i < arr.size(); ++i) {
        int counter{};
        for (size_t j{}; j < arr.size(); ++j) {
            if (arr.at(i) == arr.at(j)) {
                counter++;
                if (counter > 1) {
                    arr.erase(arr.begin() + j);
                    if (i >= j) --i; // Adjust "i" if it's after the erased element.
                }
            }
        }
    }

    //Prints out the vector arr
    for (auto value : arr) {
        std::cout << value << std::endl;
    }
    return 0;
}

As also mentioned in the comments, there are other ways of making the function more efficient; however, you have stated that you want to "practice your own problem-solving skills" (which is highly commendable), so I shall stick to offering a fix for your immediate issue.

This inner for loop is incorrect

    int counter {};
    
    for (int j {}; j < arr.size(); ++j)
    {
        if (arr.at(i) == arr.at(j))
        {
            counter++;
            
            if (counter > 1)
                arr.erase(arr.begin()+j);
        }
    }

If an element was removed the index j shall not be increased. Otherwise the next element after deleted will be bypassed because all elements after the deleted element in the vector are moved one position left.

Using the variable counter is redundant. Just start the inner loop with j = i + 1 .

Using your approach the program can look the following way

#include <iostream>
#include <vector>

int main()
{
    std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };

    for ( decltype( arr )::size_type i = 0; i < arr.size(); ++i)
    {
        for ( decltype( arr )::size_type j = i + 1; j < arr.size(); )
        {
            if (arr.at( i ) == arr.at( j ))
            {
                arr.erase( arr.begin() + j );
            }
            else
            {
                j++;
            }
        }
    }


    //Prints out the vector arr

    for (auto value : arr)
    {
        std::cout << value << std::endl;
    }
}

The program output is

1
2

This approach when each duplicated element is deleted separately is inefficient. It is better to use the so called erase-remove idiom.

Here is a demonstration program.

#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>

int main()
{
    std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };

    for (auto first = std::begin( arr ); first != std::end( arr ); ++first)
    {
        arr.erase( std::remove( std::next( first ), std::end( arr ), *first ), std::end( arr ) );
    }

    //Prints out the vector arr

    for (auto value : arr)
    {
        std::cout << value << std::endl;
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM