简体   繁体   English

从字符串中删除重复字符

[英]Remove repeating characters from string

I have a string, like eg acaddef or bbaaddgg . 我有一个字符串,例如acaddefbbaaddgg I have to remove from it, as fast as possible, all repeating characters. 我必须尽快删除所有重复的字符。 So, for example, pooaatat after should look like poat and ggaatpop should look like gatpo . 因此,例如, pooaatat之后应该看起来像poatggaatpop应该看起来像gatpo Is there any built-in function or algorithm to do that quickly? 是否有任何内置函数或算法可以快速完成? I tried to search STL, but without satisfaing result. 我试图搜索STL,但没有满意的结果。

Okay, so here are 4 different solutions. 好的,所以这里有4种不同的解决方案。

Fixed Array 固定阵列

std::string str = "pooaatat";

// Prints "poat"
short count[256] = {0};
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](unsigned char c) { return count[c]++ == 0; });

Count Algorithm + Iterator 计算算法+迭代器

std::string str = "pooaatat";

// Prints "poat"
std::string::iterator iter = str.begin();
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return !std::count(str.begin(), iter++, c); });

Unordered Set 无序集

std::string str = "pooaatat";

// Prints "poat"
std::unordered_set<char> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container.insert(c).second; });

Unordered Map 无序地图

std::string str = "pooaatat";

// Prints "poat"
std::unordered_map<char, int> container;
std::copy_if(str.begin(), str.end(), std::ostream_iterator<char>(std::cout),
             [&](char c) { return container[c]++ == 0; });

AFAIK, there is no built-in algorithm for doing this. AFAIK,没有内置的算法来做到这一点。 The std::unique algorithm is valid if you want to remove only consecutive duplicate characters. 如果要仅删除连续的重复字符,则std::unique算法有效。

However you can follow the following simple approach: 但是,您可以遵循以下简单方法:

If the string contains only ASCII characters, you can form a boolean array A[256] denoting whether the respective character has been encountered already or not. 如果字符串仅包含ASCII字符,则可以形成一个布尔数组A [256],表示是否已经遇到相应的字符。

Then simply traverse the input string and copy the character to output if A[character] is still 0 (and make A[character] = 1). 然后,如果A [character]仍为0(并使A [character] = 1),则只需遍历输入字符串并将字符复制到输出。

In case the string contains arbitrary characters, then you can use a std::unordered_map or a std::map of char to int. 如果字符串包含任意字符,那么您可以使用std::unordered_map或char的std::map到int。

Built-in regular expressions should be efficient, ie 内置的正则表达式应该是高效的,即

#include <regex>
[...]

const std::regex pattern("([\\w ])(?!\\1)");
string s = "ssha3akjssss42jj 234444 203488842882387 heeelloooo";
std::string result;

for (std::sregex_iterator i(s.begin(), s.end(), pattern), end; i != end; ++i)
    result.append((*i)[1]);

std::cout << result << std::endl;

Of course, you can modify the cpaturing group to your needs. 当然,您可以根据需要修改cpaturing组。 The good thing is that it is supported in Visual Studio 2010 tr1 already. 好消息是它已经在Visual Studio 2010 tr1中得到支持。 gcc 4.8, however, seems to have a problem with regex iterators. 但是,gcc 4.8似乎与正则表达式迭代器有问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM