简体   繁体   中英

How can I concisely find all digits in a string without using a loop?

I want to get all digits in a std::string but without using a loop (myself; what the code I'm calling uses, I don't mind). An alternative view of the request is: remove all non-digits from the string, leaving only the digits. I know that I can find all digit in a string using code like this:

std::string get_digits(std::string input) {
    std::string::size_type next_digit(0u);
    for (std::string::size_type pos(0u);
         input.npos != (pos = input.find_first_of("0123456789"));
         ++pos) {
        input[next_digit++] = input[pos];
    }
    input.resize(next_digit);
    return input;
}

However, this function uses a loop. std::string doesn't provide a function find_all() or something! Ideally, the string is maniulated in-place (the code above moves it but it is easily changed to take a reference).

When there are multiple alternatives, I'll promise to post profiling results of how good the different approaches work on some lengthy text.

One way would be to use std::copy_if (or std::remove_if ):

std::string get_digits(std::string input) {
    std::string result;
    std::copy_if(
        input.begin(), 
        input.end(), 
        std::back_inserter(result), 
        [](char c) { return '0' <= c && c <= '9'; });
    return result;
}

Obviously this uses a loop internally, but you said you don't care about that...

Edit: With std::remove_if :

std::string get_digits_remove(std::string input) {
    auto itErase = std::remove_if(
        input.begin(), 
        input.end(), 
        [](char c) { return !('0' <= c && c <= '9'); });
    input.erase(itErase, input.end());
    return input;
}

Although I primarily had hoped for 5 quick answers (which wasn't achieved, sigh) the answers and comments led to some interesting approaches I hadn't thought of myself. My personal expectation had been that the answers effectively would result in:

  • If you want to be fast, use

     input.erase(std::remove_if(input.begin(), input.end(), [](unsigned char c){ return !std::isdigit(c); }), input.end()); 
  • If you want to be concise, use

     text = std::regex_replace(text, std::regex(R"(\\D)"), ""); 

Instead, there were a number of approaches I hadn't even considered:

  • Use a recursive function!

  • Use std::partition() which seems to require extra work (retain the characters which will be thrown out) and changes the order.

  • Use std::stable_partition() which seems to require even more work but doesn't change the order.

  • Use std::sort() and extract the substring with the relevant characters although I don't know how to make that one retain the original sequence of character. Just using a stable version doesn't quite to it.

Putting the different approaches together and using a number of variations on how to classify the characters, led to a total of 17 version of roughly the same operation (the code is on github ). Most of the versions use std::remove_if() and std::string::erase() but differ in the classification of digits.

  1. remove_if() with [](char c){ return d.find(c) == d.npos; }) [](char c){ return d.find(c) == d.npos; }) .
  2. remove_if() with [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); } [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
  3. remove_if() with [](char c){ return !std::binary_search(d.begin(), d.end()); } [](char c){ return !std::binary_search(d.begin(), d.end()); }
  4. remove_if() with [](char c){ return '0' <= c && c <= '9'; } [](char c){ return '0' <= c && c <= '9'; }
  5. remove_if() with [](unsigned char c){ return !std::isdigit(c); } [](unsigned char c){ return !std::isdigit(c); } (the char is passed as unsigned char to avoid undefined behavior in case c is a char with a negative value)
  6. remove_if() with std::not1(std::ptr_fun(std::static_cast<int(*)(int)>(&std::isdigit))) (the cast is necessary to determine the correct overload: std::isdigit() happens to be overloaded).
  7. remove_if() with [&](char c){ return !hash.count(c); } [&](char c){ return !hash.count(c); }
  8. remove_if() with [&](char c){ return filter[c]; } [&](char c){ return filter[c]; } (the code initializing actually uses a loop)
  9. remove_if() with [&](char c){ return std::isidigit(c, locale); } [&](char c){ return std::isidigit(c, locale); }
  10. remove_if() with [&](char c){ return ctype.is(std::ctype_base::digit, c); } [&](char c){ return ctype.is(std::ctype_base::digit, c); }
  11. str.erase(std::parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
  12. str.erase(std::stable_parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
  13. the "sort-approach" describe in one of the answers
  14. the copy_if() approach described in one of the answers
  15. the recursive approach describe in one of the answers
  16. text = std::regex_replace(text, std::regex(R"(\\D)"), ""); (I didn't manage to get this to work on icc)
  17. like 16 but with an already built regular expression

I have run the benchmark on a MacOS notebook. Since results like this are reasonably easy to graph with Google Chars, here is a graph of the results (although with the versions using regexps removed as these would cause the graph to scale such that the interesting bit isn't really visible). The results of the benchmarks in form of a table:

    test                          clang   gcc     icc
 1  use_remove_if_str_find        22525   26846  24815
 2  use_remove_if_find            31787   23498  25379
 3  use_remove_if_binary_search   26709   27507  37016
 4  use_remove_if_compare          2375    2263   1847
 5  use_remove_if_ctype            1956    2209   2218
 6  use_remove_if_ctype_ptr_fun    1895    2304   2236
 7  use_remove_if_hash            79775   60554  81363
 8  use_remove_if_table            1967    2319   2769
 9  use_remove_if_locale_naive    17884   61096  21301
10  use_remove_if_locale           2801    5184   2776
11  use_partition                  1987    2260   2183
12  use_stable_partition           7134    4085  13094
13  use_sort                      59906  100581  67072
14  use_copy_if                    3615    2845   3654
15  use_recursive                  2524    2482   2560
16  regex_build                  758951  531641 
17  regex_prebuild               775450  519263

You can do this in-place with std::partition :

std::string get_digits(std::string& input)
{
    auto split =
        std::partition( std::begin(input), std::end(input), [](char c){return ::isdigit(c);} );

    size_t len = std::distance( std::begin(input), split );
    input.resize( len );
    return input;
}

std::partition does not guarantee order, so if order matters, use std::stable_partition

I would start with a nice primitive function that composes the std algorithms you want to use:

template<class Container, class Test>
void erase_remove_if( Container&& c, Test&& test ) {
  using std::begin; using std::end;
  auto it = std::remove_if( begin(c), end(c), std::forward<Test>(test) );
  c.erase( it, end(c) );
}

then we write save digits:

std::string save_digits( std::string s ) {
  erase_remove_if( s,
    [](char c){
      if (c > '9') return true;
      return c < '0';
    }
  );
  return s;
}

Maybe the simple answer suffices?

std::string only_the_digits(std::string s)
{
    s.erase(std::remove_if(s.begin(), s.end(),
                           [](char c) { return !::isdigit(c); }), s.end());
    return s;
}

The downside of this approach is that it unconditionally creates a copy of the input data. If there are lots of digits, then that's OK, since we're reusing that object. Alternatively, you can make this function just modify the string in-place ( void strip_non_digits(std::string &) .)

But if there are only few digits and you want to leave the input untouched, then you may prefer to create a new (small) output object and not copy the input. This can be done with a referential view of the input string, eg as provided by the Fundamentals TS, and using copy_if :

std::string only_the_digits(std::experimental::string_view sv)
{
    std::string result;
    std::copy_if(sv.begin(), sv.end(), std::back_inserter(::isdigit));
    return result;
}
// terrible no-loop solution
void getDigs(const char* inp, char* dig)
{
    if (!*inp)
        return;
    if (*inp>='0' && *inp<='9')
    {
        *dig=*inp; 
        dig++;
        *dig=0;
    }
    getDigs(inp+1,dig);
}

No loop solution in 4 steps (but with error checking, more than 4 statements):

1) sort the string, using a suitable sort (incrementing order) ... now all digits will be together, concatentated

2) use std::string.find_first_of() to find the index of the first digit (be sure to check for a digit found)

3) use std::string.find_last_of() to find the index of the last digit (be sure to check for a digit found)

4) use std::string::substr() and the 2 previous indexes to extract the digits

This is about as succinct as I can get it I think.

std::string get_digits(std::string input)
{
    input.erase(std::stable_partition(
                               std::begin(input),
                               std::end(input),
                               ::isdigit),
                std::end(input));

    return input;
}

Features:

  1. passes sink argument by value to take advantage of copy elision in c++11
  2. preserves order of digits.
  3. No user code - uses only peer-reviewed stl functions. Chance of bugs - zero.

This would be the stl-style iterator-based approach:

template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
    return std::copy_if(first, last, first_out, ::isdigit);
}

This has a number of advantages:

  1. input can be any iterable range of chars, not just strings
  2. can be chained by virtue of returning the output iterator
  3. allows destination container/iterator (including an ostream_iterator)
  4. with a little love, it could be made to handle unicode chars etc

fun example:

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>

template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
    return std::copy_if(first, last, first_out, ::isdigit);
}

using namespace std;

int main()
{
    char chunk1[] = "abc123bca";
    string chunk2 { "def456fed" };
    vector<char> chunk3 = { 'g', 'h', 'i', '7', '8', '9', 'i', 'h', 'g' };

    string result;
    auto pos = collect_digits(begin(chunk1), end(chunk1), back_inserter(result));
    pos = collect_digits(begin(chunk2), end(chunk2), pos);
    collect_digits(begin(chunk3), end(chunk3), pos);
    cout << "first collect: " << result << endl;

    cout << "second collect: ";
    collect_digits(begin(chunk3),
                   end(chunk3),
                   collect_digits(begin(chunk2),
                                  end(chunk2),
                                  collect_digits(begin(chunk1),
                                                 end(chunk1),
                                                 ostream_iterator<char>(cout))));
    cout << endl;

    return 0;
}

只要#include <regex>出现在它之前我就使用这个单行宏,否则你包括:

#define DIGITS_IN_STRING(a) std::regex_replace(a, std::regex(R"([\D])"), "")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM