Removing specified characters from a string - Efficient methods (time and space complexity)

Question

Here is the problem: Remove specified characters from a given string.

Input: The string is "Hello World!" and characters to be deleted are "lor"
Output: "He Wd!"

Solving this involves two sub-parts:

Determining if the given character is to be deleted
If so, then deleting the character

To solve the first part, I am reading the characters to be deleted into a std::unordered_map , ie I parse the string "lor" and insert each character into the hashmap. Later, when I am parsing the main string, I will look into this hashmap with each character as the key and if the returned value is non-zero, then I delete the character from the string.

Question 1: Is this the best approach?

Question 2: Which would be better for this problem? std::map or std::unordered_map ? Since I am not interested in ordering, I used an unordered_map . But is there a higher overhead for creating the hash table? What to do in such situations? Use a map (balanced tree) or a unordered_map (hash table)?

Now coming to the next part, ie deleting the characters from the string. One approach is to delete the character and shift the data from that point on, back by one position. In the worst case, where we have to delete all the characters, this would take O(n^2).

The second approach would be to copy only the required characters to another buffer. This would involve allocating enough memory to hold the original string and copy over character by character leaving out the ones that are to be deleted. Although this requires additional memory, this would be a O(n) operation.

The third approach, would be to start reading and writing from the 0th position, increment the source pointer when every time I read and increment the destination pointer only when I write. Since source pointer will always be same or ahead of destination pointer, I can write over the same buffer. This saves memory and is also an O(n) operation. I am doing the same and calling resize in the end to remove the additional unnecessary characters?

Here is the function I have written:

// str contains the string (Hello World!)
// chars contains the characters to be deleted (lor)
void remove_chars(string& str, const string& chars)
{
    unordered_map<char, int> chars_map;

    for(string::size_type i = 0; i < chars.size(); ++i)
        chars_map[chars[i]] = 1;

    string::size_type i = 0; // source
    string::size_type j = 0; // destination
    while(i < str.size())
    {
        if(chars_map[str[i]] != 0)
            ++i;
        else
        {
            str[j] = str[i];
            ++i;
            ++j;
        }
    }

    str.resize(j);
}

Question 3: What are the different ways by which I can improve this function. Or is this best we can do?

Thanks!

Answer 1

做得好，现在了解标准库算法并提高：

str.erase(std::remove_if(str.begin(), str.end(), boost::is_any_of("lor")), str.end());

Answer 2

Assuming that you're studying algorithms, and not interested in library solutions:

Hash tables are most valuable when the number of possible keys is large, but you only need to store a few of them. Your hash table would make sense if you were deleting specific 32-bit integers from digit sequences. But with ASCII characters, it's overkill.

Just make an array of 256 bools and set a flag for the characters you want to delete. It only uses one table lookup instruction per input character. Hash map involves at least a few more instructions to compute the hash function. Space-wise, they are probably no more compact once you add up all the auxiliary data.

void remove_chars(string& str, const string& chars)
{
    // set up the look-up table
    std::vector<bool> discard(256, false);
    for (int i = 0; i < chars.size(); ++i)
    {
        discard[chars[i]] = true;
    }

    for (int j = 0; j < str.size(); ++j)
    {
        if (discard[str[j]])
        {
            // do something, depending on your storage choice
        }
    }
}

Regarding your storage choices: Choose between options 2 and 3 depending on whether you need to preserve the input data or not. 3 is obviously most efficient, but you don't always want an in-place procedure.

Answer 3

Here is a KISS solution with many advantages:

void remove_chars (char *dest, const char *src, const char *excludes)
{
    do {
        if (!strchr (excludes, *src))
            *dest++ = *src;
    } while (*src++);
    *dest = '\000';
}

Answer 4

You can ping pong between strcspn and strspn to avoid the need for a hash table:

void remove_chars(
    const char *input, 
    char *output, 
    const char *characters)
{
    const char *next_input= input;
    char *next_output= output;

    while (*next_input!='\0')
    {
        int copy_length= strspn(next_input, characters);
        memcpy(next_output, next_input, copy_length);

        next_output+= copy_length;

        next_input+= copy_length;
        next_input+= strcspn(next_input, characters);
    }
}

Removing specified characters from a string - Efficient methods (time and space complexity)

Question

4 answers

solution1
3 2012-01-12 22:10:43

solution2
2 ACCPTED 2012-01-12 22:22:36

solution3
1 2012-01-12 22:32:53

solution4
0 2012-01-12 23:30:01

Removing specified characters from a string - Efficient methods (time and space complexity)

Question

4 answers

solution1 3 2012-01-12 22:10:43

solution2 2 ACCPTED 2012-01-12 22:22:36

solution3 1 2012-01-12 22:32:53

solution4 0 2012-01-12 23:30:01

solution1
3 2012-01-12 22:10:43

solution2
2 ACCPTED 2012-01-12 22:22:36

solution3
1 2012-01-12 22:32:53

solution4
0 2012-01-12 23:30:01