简体   繁体   English

从字符串中删除指定的字符-高效的方法(时间和空间复杂度)

[英]Removing specified characters from a string - Efficient methods (time and space complexity)

Here is the problem: Remove specified characters from a given string. 这是问题所在:从给定的字符串中删除指定的字符。

Input: The string is "Hello World!" and characters to be deleted are "lor"
Output: "He Wd!"

Solving this involves two sub-parts: 解决这个问题涉及两个子部分:

  1. Determining if the given character is to be deleted 确定是否要删除给定字符
  2. If so, then deleting the character 如果是这样,则删除字符

To solve the first part, I am reading the characters to be deleted into a std::unordered_map , ie I parse the string "lor" and insert each character into the hashmap. 为了解决第一部分,我正在读取要删除的字符到std::unordered_map ,即,我解析字符串“ lor”并将每个字符插入到哈希图中。 Later, when I am parsing the main string, I will look into this hashmap with each character as the key and if the returned value is non-zero, then I delete the character from the string. 稍后,当我解析主字符串时,我将使用每个字符作为键查看此哈希图,如果返回的值非零,则将从字符串中删除该字符。

Question 1: Is this the best approach? 问题1:这是最好的方法吗?

Question 2: Which would be better for this problem? 问题2:哪个对这个问题更好? std::map or std::unordered_map ? std::mapstd::unordered_map吗? Since I am not interested in ordering, I used an unordered_map . 由于我对订购不感兴趣,因此我使用了unordered_map But is there a higher overhead for creating the hash table? 但是创建哈希表是否有更高的开销? What to do in such situations? 在这种情况下该怎么办? Use a map (balanced tree) or a unordered_map (hash table)? 使用map (平衡树)还是unordered_map (哈希表)?

Now coming to the next part, ie deleting the characters from the string. 现在进入下一部分,即从字符串中删除字符。 One approach is to delete the character and shift the data from that point on, back by one position. 一种方法是删除字符并将数据从该点开始移回一个位置。 In the worst case, where we have to delete all the characters, this would take O(n^2). 在最坏的情况下,我们必须删除所有字符,这将花费O(n ^ 2)。

The second approach would be to copy only the required characters to another buffer. 第二种方法是仅将所需的字符复制到另一个缓冲区。 This would involve allocating enough memory to hold the original string and copy over character by character leaving out the ones that are to be deleted. 这将涉及分配足够的内存来容纳原始字符串,并逐个字符地进行复制,而忽略要删除的字符串。 Although this requires additional memory, this would be a O(n) operation. 尽管这需要额外的内存,但这将是O(n)操作。

The third approach, would be to start reading and writing from the 0th position, increment the source pointer when every time I read and increment the destination pointer only when I write. 第三种方法是从第0个位置开始读取和写入,每次读取时增加源指针,仅在写入时增加目标指针。 Since source pointer will always be same or ahead of destination pointer, I can write over the same buffer. 由于源指针将始终与目标指针相同或位于目标指针之前,因此我可以在同一缓冲区上进行写操作。 This saves memory and is also an O(n) operation. 这样可以节省内存,并且也是O(n)操作。 I am doing the same and calling resize in the end to remove the additional unnecessary characters? 我在做同样的事情,并在最后调用resize来删除其他不必要的字符?

Here is the function I have written: 这是我编写的函数:

// str contains the string (Hello World!)
// chars contains the characters to be deleted (lor)
void remove_chars(string& str, const string& chars)
{
    unordered_map<char, int> chars_map;

    for(string::size_type i = 0; i < chars.size(); ++i)
        chars_map[chars[i]] = 1;

    string::size_type i = 0; // source
    string::size_type j = 0; // destination
    while(i < str.size())
    {
        if(chars_map[str[i]] != 0)
            ++i;
        else
        {
            str[j] = str[i];
            ++i;
            ++j;
        }
    }

    str.resize(j);
}

Question 3: What are the different ways by which I can improve this function. 问题3:我可以通过哪些不同方式来改善此功能。 Or is this best we can do? 还是我们能做到的最好?

Thanks! 谢谢!

做得好,现在了解标准库算法并提高:

str.erase(std::remove_if(str.begin(), str.end(), boost::is_any_of("lor")), str.end());

Assuming that you're studying algorithms, and not interested in library solutions: 假设您正在研究算法,并且对库解决方案不感兴趣:

Hash tables are most valuable when the number of possible keys is large, but you only need to store a few of them. 当可能的密钥数量很大时,哈希表最有价值,但是您只需要存储其中的几个即可。 Your hash table would make sense if you were deleting specific 32-bit integers from digit sequences. 如果要从数字序列中删除特定的32位整数,则哈希表将很有意义。 But with ASCII characters, it's overkill. 但是,对于ASCII字符,这是太过分了。

Just make an array of 256 bools and set a flag for the characters you want to delete. 只需制作一个256个布尔数组,并为要删除的字符设置一个标志。 It only uses one table lookup instruction per input character. 每个输入字符仅使用一个查表指令。 Hash map involves at least a few more instructions to compute the hash function. 哈希映射至少涉及一些其他指令来计算哈希函数。 Space-wise, they are probably no more compact once you add up all the auxiliary data. 在空间上,一旦将所有辅助数据加起来,它们可能不再紧凑。

void remove_chars(string& str, const string& chars)
{
    // set up the look-up table
    std::vector<bool> discard(256, false);
    for (int i = 0; i < chars.size(); ++i)
    {
        discard[chars[i]] = true;
    }

    for (int j = 0; j < str.size(); ++j)
    {
        if (discard[str[j]])
        {
            // do something, depending on your storage choice
        }
    }
}

Regarding your storage choices: Choose between options 2 and 3 depending on whether you need to preserve the input data or not. 关于存储选项:根据是否需要保留输入数据,在选项2和3之间进行选择。 3 is obviously most efficient, but you don't always want an in-place procedure. 3显然是最有效的,但是您并不总是需要就地过程。

Here is a KISS solution with many advantages: 这是具有许多优势的KISS解决方案:

void remove_chars (char *dest, const char *src, const char *excludes)
{
    do {
        if (!strchr (excludes, *src))
            *dest++ = *src;
    } while (*src++);
    *dest = '\000';
}

You can ping pong between strcspn and strspn to avoid the need for a hash table: 您可以在strcspnstrspn之间strcspn乒乓strcspn ,以避免需要哈希表:

void remove_chars(
    const char *input, 
    char *output, 
    const char *characters)
{
    const char *next_input= input;
    char *next_output= output;

    while (*next_input!='\0')
    {
        int copy_length= strspn(next_input, characters);
        memcpy(next_output, next_input, copy_length);

        next_output+= copy_length;

        next_input+= copy_length;
        next_input+= strcspn(next_input, characters);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM