简体   繁体   中英

std::remove_if GCC implementation isn't efficient?

From another question here there seems to be evidence, that GCC's implementation of std::remove_if doesn't provide equally efficiency compared to the following implementation:

'raw homebrew' solution :

static char str1[100] = "str,, ing";
size_t size = sizeof(str1)/sizeof(str1[0]);

int bad = 0;
int cur = 0;
while (str1[cur] != '\0') {
    if (bad < cur && !ispunct(str1[cur]) && !isspace(str1[cur])) {
        str1[bad] = str1[cur];
    }
    if (ispunct(str1[cur]) || isspace(str1[cur])) {
        cur++;
    } else {
        cur++;
        bad++;
    }
}
str1[bad] = '\0';

Timing outputs:

0.106860

Sample benchmarking code for std::remove_if for a solution of the same problem:

bool is_char_category_in_question(const char& c) {
    return std::ispunct(c) || std::isspace(c);
}

std::remove_if(&str1[0], &str1[size-1], is_char_category_in_question);

Timing outputs:

1.986838

Check and get actual runtime results for the code running the ideone links above please (giving the full codes here would obscure the question!).

Given the provided execution time results (from the samples), these seem to confirm the first implementation is having much better performance.

Can anyone tell reasons, why the std::remove_if() algorithm doesn't (or can't) provide a similarly efficient solution for the given problem?

Looks to me as though you're running remove_if on a range of 100 characters since size is 100, but the "homebrew" runs until you find the nul terminator (which is only 10 characters in).

Dealing with that using the change in your comment below, on GCC with -O2 I still see a difference of about a factor of 2, with remove_if being slower. Changing to:

struct is_char_category_in_question {
    bool operator()(const char& c) const {
        return std::ispunct(c) || std::isspace(c);
    }
};

gets rid of almost all of this difference, although there may still be a <10% difference. So that looks to me like a quality of implementation issue, whether or not the test gets inlined although I haven't checked the assembly to confirm.

Since your test harness means that no characters are actually removed after the first pass, I'm not troubled by a 10% difference. I'm a bit surprised, but not enough to really get into it. YMMV :-)

You can try to use the erase-remove idiom for a small improvement.

std::string str{str1};
for(i=0;i<999999L;++i) {
    str.erase( std::remove_if(std::begin(str), std::end(str), is_char_category_in_question), std::end(str) );
}

This is combined with the other issue Steve Jessop mentioned, so I replaced size - 1 with 10 but you can use strlen if you wish. For this test, my Makefile looks like:

compile:
    g++ test.cpp -o test -std=c++11 -O3
    g++ test2.cpp -o test2 -std=c++11 -O3
    g++ test3.cpp -o test3 -std=c++11 -O3
run:
    perf stat -r 10 ./test
    perf stat -r 10 ./test2
    perf stat -r 10 ./test3

test is the erase-remove version, test2 is the remove_if version, and test3 is the other version. Here are the results:

 Performance counter stats for './test' (10 runs):

       0.035699861 seconds time elapsed                                          ( +-  2.30% )

perf stat -r 10 ./test2

 Performance counter stats for './test2' (10 runs):

       0.050991938 seconds time elapsed                                          ( +-  2.96% )

perf stat -r 10 ./test3

 Performance counter stats for './test3' (10 runs):

       0.038070704 seconds time elapsed                                          ( +-  2.34% )

I omitted the verbose information, and I only ran it 10 times. You can try the tests yourself for a better interpretation of the results.

Why not using a lambda ?

To remove even int from vector v :

int main (int argc, char* argv []) {
  int tmp [5] = {1, 2, 4, 5, 7};
  int* p ((int*) tmp);
  std::vector<int> v (p, p+5);
  std::cout << "v init :";
  for (auto i (v.begin ()); i != v.end (); ++i) std::cout << *i << " ";
  std::cout << std::endl;

  auto i (v.begin ());
  std::for_each (v.begin (), v.end (), [&i] (const int s) {
    if (s%2) *i++ = s;
  });

  if (i != v.end ()) v.erase (i, v.end ());

  std::cout << "v odd :";
  for (auto i (v.begin ()); i != v.end (); ++i) std::cout << *i << " ";
  std::cout << std::endl;
}

normal outputs :

v init :1 2 4 5 7 
v odd :1 5 7 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM