简体   繁体   中英

String replacement in C++ on string of arbitrary length

I have a string I get from ostringstream . I'm currently trying to replace some characters in this string ( content.replace(content.begin(), content.end(), "\\n", ""); ) but sometimes I get an exception:

malloc: *** mach_vm_map(size=4294955008) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
std::bad_alloc

I suspect that this happens because the string is too big. What's the best practice for these situations? Declare the string on the heap?

Update

My full method:

xml_node HTMLDocument::content() const {
  xml_node html = this->doc.first_child();
  xml_node body = html.child("body");
  xml_node section = body.child("section");
  std::ostringstream oss;
  if (section.type() != xml_node_type::node_null) {
    section.print(oss);
  } else {
    body.print(oss);
  }
  string content;
  content = oss.str();
  content.replace(content.begin(), content.end(), "<section />", "<section></section>");
  content.replace(content.begin(), content.end(), "\t", "");
  xml_node node;
  return node;
}

There is no std::string::replace member function's overload that accepts a pair of iterators, a const char* to be searched for and const char* to be used as replacement, and this is where your problem comes from:

content.replace(content.begin(), content.end(), "\n", "");

matches the following overload:

template <class InputIterator>
string& replace(iterator i1, iterator i2,
                InputIterator first, InputIterator last);

that is, "\\n" and "" is treated as the range <first; last) <first; last) , which, depending on what addresses do they have, crashes your program or not.

You have to either use std::regex or implement your own logic that iterates through std::string and replaces any encountered pattern with a replacement string.

The lines:

content.replace(content.begin(), content.end(), "<section />", "<section></section>");
content.replace(content.begin(), content.end(), "\t", "");

result in undefined behavior. They match the function:

template<class InputIterator>
std::string& std::string::replace(
    const_iterator i1, const_iterator i2,
    InputIterator j1, InputIterator j2);

with InputIterator resolving to char const* . The problem is that the distance between the two iterators, and whether the second can be reached from the first, is undefined, since they point to totally unrelated bits of memory.

From your code, I don't think you understand what std::string::replace does. It replaces the range [i1,i2) in the string with the text defined by the range [j1,j2) . It does not do any search and comparison; it is for use after you have found the range which needs replacing. Calling:

content.replace(content.begin(), content.end(), "<section />", "<section></section>");

has exactly the same effect as:

content = std::string( "<section />", "<section></section>");

, which is certainly not what you want.

In C++11, there's a regex_replace function which may be of some use, although if you're really doing this on very large strings, it may not be the most performant (the added flexibility of regular expressions comes at a price); I'd probably use something like:

std::string
searchAndReplace(
    std::string const& original,
    std::string const& from,
    std::string const& to)
{
    std::string results;
    std::string::const_iterator current = original.begin();
    std::string::const_iterator end = original.end();
    std::string::const_iterator next = std::search( current, end, from.begin(), from.end() );
    while ( next != end ) {
        results.append( current, next );
        results.append( to );
        current = next + from.size();
        next = std::search( current, end, from.begin(), from.end() );
    }
    results.append( current, next );
    return results;
}

For very large strings, some heuristic for guessing the size, and then doing a reserve on results is probably a good idea as well.

Finally, since your second line just removes '\\t' , you'd be better off using std::remove :

content.erase( std::remove( content.begin(), content.end(), '\t' ), content.end() );

AFAIK stl strings are always allocated on the heap if they go over a certain (small) size, eg 32 chars in Visual Studio

What you can do if you get allocation exceptions:

  • Use a custom allocator
  • Use a " rope " class.

Bad alloc might not mean you're run out of memory, more likely that you're run out of contiguous memory. A rope class might be better suited to you as it allocated strings in pieces internally.

This is one of the correct (and reasonably efficient) ways to remove characters from a string if you want to make a copy and leave the original intact:

#include <algorithm>
#include <string>

std::string delete_char(std::string src, char to_remove)
{
    // note: src is a copy so we can mutate it

    // move all offending characters to the end and get the iterator to last good char + 1
    auto begin_junk = std::remove_if(src.begin(),
                                     src.end(),
                                     [&to_remove](const char c) { return c == to_remove; });
    // chop off all the characters we wanted to remove
    src.erase(begin_junk,
              src.end());

    // move the string back to the caller's result
    return std::move(src);
}

called like this:

std::string src("a\nb\bc");
auto dest = delete_char(src, '\n');
assert(dest == "abc");

If you'd prefer to modify the string in place then simply:

src.erase(std::remove_if(src.begin(), src.end(), [](char c) { return c == '\n'; }), src.end());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM