I want to get all digits in a std::string
but without using a loop (myself; what the code I'm calling uses, I don't mind). An alternative view of the request is: remove all non-digits from the string, leaving only the digits. I know that I can find all digit in a string using code like this:
std::string get_digits(std::string input) {
std::string::size_type next_digit(0u);
for (std::string::size_type pos(0u);
input.npos != (pos = input.find_first_of("0123456789"));
++pos) {
input[next_digit++] = input[pos];
}
input.resize(next_digit);
return input;
}
However, this function uses a loop. std::string
doesn't provide a function find_all()
or something! Ideally, the string is maniulated in-place (the code above moves it but it is easily changed to take a reference).
When there are multiple alternatives, I'll promise to post profiling results of how good the different approaches work on some lengthy text.
One way would be to use std::copy_if
(or std::remove_if
):
std::string get_digits(std::string input) {
std::string result;
std::copy_if(
input.begin(),
input.end(),
std::back_inserter(result),
[](char c) { return '0' <= c && c <= '9'; });
return result;
}
Obviously this uses a loop internally, but you said you don't care about that...
Edit: With std::remove_if
:
std::string get_digits_remove(std::string input) {
auto itErase = std::remove_if(
input.begin(),
input.end(),
[](char c) { return !('0' <= c && c <= '9'); });
input.erase(itErase, input.end());
return input;
}
Although I primarily had hoped for 5 quick answers (which wasn't achieved, sigh) the answers and comments led to some interesting approaches I hadn't thought of myself. My personal expectation had been that the answers effectively would result in:
If you want to be fast, use
input.erase(std::remove_if(input.begin(), input.end(), [](unsigned char c){ return !std::isdigit(c); }), input.end());
If you want to be concise, use
text = std::regex_replace(text, std::regex(R"(\\D)"), "");
Instead, there were a number of approaches I hadn't even considered:
Use a recursive function!
Use std::partition()
which seems to require extra work (retain the characters which will be thrown out) and changes the order.
Use std::stable_partition()
which seems to require even more work but doesn't change the order.
Use std::sort()
and extract the substring with the relevant characters although I don't know how to make that one retain the original sequence of character. Just using a stable version doesn't quite to it.
Putting the different approaches together and using a number of variations on how to classify the characters, led to a total of 17 version of roughly the same operation (the code is on github ). Most of the versions use std::remove_if()
and std::string::erase()
but differ in the classification of digits.
remove_if()
with [](char c){ return d.find(c) == d.npos; })
[](char c){ return d.find(c) == d.npos; })
. remove_if()
with [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
[](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
remove_if()
with [](char c){ return !std::binary_search(d.begin(), d.end()); }
[](char c){ return !std::binary_search(d.begin(), d.end()); }
remove_if()
with [](char c){ return '0' <= c && c <= '9'; }
[](char c){ return '0' <= c && c <= '9'; }
remove_if()
with [](unsigned char c){ return !std::isdigit(c); }
[](unsigned char c){ return !std::isdigit(c); }
(the char
is passed as unsigned char
to avoid undefined behavior in case c
is a char
with a negative value) remove_if()
with std::not1(std::ptr_fun(std::static_cast<int(*)(int)>(&std::isdigit)))
(the cast is necessary to determine the correct overload: std::isdigit()
happens to be overloaded). remove_if()
with [&](char c){ return !hash.count(c); }
[&](char c){ return !hash.count(c); }
remove_if()
with [&](char c){ return filter[c]; }
[&](char c){ return filter[c]; }
(the code initializing actually uses a loop) remove_if()
with [&](char c){ return std::isidigit(c, locale); }
[&](char c){ return std::isidigit(c, locale); }
remove_if()
with [&](char c){ return ctype.is(std::ctype_base::digit, c); }
[&](char c){ return ctype.is(std::ctype_base::digit, c); }
str.erase(std::parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
str.erase(std::stable_parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
copy_if()
approach described in one of the answers text = std::regex_replace(text, std::regex(R"(\\D)"), "");
(I didn't manage to get this to work on icc) I have run the benchmark on a MacOS notebook. Since results like this are reasonably easy to graph with Google Chars, here is a graph of the results (although with the versions using regexps removed as these would cause the graph to scale such that the interesting bit isn't really visible). The results of the benchmarks in form of a table:
test clang gcc icc
1 use_remove_if_str_find 22525 26846 24815
2 use_remove_if_find 31787 23498 25379
3 use_remove_if_binary_search 26709 27507 37016
4 use_remove_if_compare 2375 2263 1847
5 use_remove_if_ctype 1956 2209 2218
6 use_remove_if_ctype_ptr_fun 1895 2304 2236
7 use_remove_if_hash 79775 60554 81363
8 use_remove_if_table 1967 2319 2769
9 use_remove_if_locale_naive 17884 61096 21301
10 use_remove_if_locale 2801 5184 2776
11 use_partition 1987 2260 2183
12 use_stable_partition 7134 4085 13094
13 use_sort 59906 100581 67072
14 use_copy_if 3615 2845 3654
15 use_recursive 2524 2482 2560
16 regex_build 758951 531641
17 regex_prebuild 775450 519263
You can do this in-place with std::partition
:
std::string get_digits(std::string& input)
{
auto split =
std::partition( std::begin(input), std::end(input), [](char c){return ::isdigit(c);} );
size_t len = std::distance( std::begin(input), split );
input.resize( len );
return input;
}
std::partition
does not guarantee order, so if order matters, use std::stable_partition
I would start with a nice primitive function that composes the std
algorithms you want to use:
template<class Container, class Test>
void erase_remove_if( Container&& c, Test&& test ) {
using std::begin; using std::end;
auto it = std::remove_if( begin(c), end(c), std::forward<Test>(test) );
c.erase( it, end(c) );
}
then we write save digits:
std::string save_digits( std::string s ) {
erase_remove_if( s,
[](char c){
if (c > '9') return true;
return c < '0';
}
);
return s;
}
Maybe the simple answer suffices?
std::string only_the_digits(std::string s)
{
s.erase(std::remove_if(s.begin(), s.end(),
[](char c) { return !::isdigit(c); }), s.end());
return s;
}
The downside of this approach is that it unconditionally creates a copy of the input data. If there are lots of digits, then that's OK, since we're reusing that object. Alternatively, you can make this function just modify the string in-place ( void strip_non_digits(std::string &)
.)
But if there are only few digits and you want to leave the input untouched, then you may prefer to create a new (small) output object and not copy the input. This can be done with a referential view of the input string, eg as provided by the Fundamentals TS, and using copy_if
:
std::string only_the_digits(std::experimental::string_view sv)
{
std::string result;
std::copy_if(sv.begin(), sv.end(), std::back_inserter(::isdigit));
return result;
}
// terrible no-loop solution
void getDigs(const char* inp, char* dig)
{
if (!*inp)
return;
if (*inp>='0' && *inp<='9')
{
*dig=*inp;
dig++;
*dig=0;
}
getDigs(inp+1,dig);
}
No loop solution in 4 steps (but with error checking, more than 4 statements):
1) sort the string, using a suitable sort (incrementing order) ... now all digits will be together, concatentated
2) use std::string.find_first_of() to find the index of the first digit (be sure to check for a digit found)
3) use std::string.find_last_of() to find the index of the last digit (be sure to check for a digit found)
4) use std::string::substr() and the 2 previous indexes to extract the digits
This is about as succinct as I can get it I think.
std::string get_digits(std::string input)
{
input.erase(std::stable_partition(
std::begin(input),
std::end(input),
::isdigit),
std::end(input));
return input;
}
Features:
This would be the stl-style iterator-based approach:
template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
return std::copy_if(first, last, first_out, ::isdigit);
}
This has a number of advantages:
fun example:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
return std::copy_if(first, last, first_out, ::isdigit);
}
using namespace std;
int main()
{
char chunk1[] = "abc123bca";
string chunk2 { "def456fed" };
vector<char> chunk3 = { 'g', 'h', 'i', '7', '8', '9', 'i', 'h', 'g' };
string result;
auto pos = collect_digits(begin(chunk1), end(chunk1), back_inserter(result));
pos = collect_digits(begin(chunk2), end(chunk2), pos);
collect_digits(begin(chunk3), end(chunk3), pos);
cout << "first collect: " << result << endl;
cout << "second collect: ";
collect_digits(begin(chunk3),
end(chunk3),
collect_digits(begin(chunk2),
end(chunk2),
collect_digits(begin(chunk1),
end(chunk1),
ostream_iterator<char>(cout))));
cout << endl;
return 0;
}
只要#include <regex>
出现在它之前我就使用这个单行宏,否则你包括:
#define DIGITS_IN_STRING(a) std::regex_replace(a, std::regex(R"([\D])"), "")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.