Segmentation fault in std::less<char>

Question

I have the following code (C++0x):

const set<char> s_special_characters =  { '(', ')', '{', '}', ':' };

void nectar_loader::tokenize( string &line, const set<char> &special_characters )
{
    auto it = line.begin();
    const auto not_found = special_characters.end();

    // first character special case
    if( it != line.end() && special_characters.find( *it ) != not_found )
        it = line.insert( it+1, ' ' ) + 1;

    while( it != line.end() )
    {
        // check if we're dealing with a special character
        if( special_characters.find(*it) != not_found ) // <----------
        {
            // ensure a space before
            if( *(it-1) != ' ' )
                it = line.insert( it, ' ' ) + 1;
            // ensure a space after
            if( (it+1) != line.end() && *(it+1) != ' ' )
                it = line.insert( it+1, ' ');
            else
                line.append(" ");
        }
        ++it;
    }
}

with the crash pointing at the indicated line. This results in a segfault with this gdb backtrace:

#0  0x000000000040f043 in std::less<char>::operator() (this=0x622a40, __x=@0x623610, __y=@0x644000)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_function.h:230
#1  0x000000000040efa6 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::_M_lower_bound (this=0x622a40, __x=0x6235f0, __y=0x622a48, __k=@0x644000)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1020
#2  0x000000000040e840 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::find (this=0x622a40, __k=@0x644000)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1532
#3  0x000000000040e4fd in std::set<char, std::less<char>, std::allocator<char> >::find (this=0x622a40, __x=@0x644000)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_set.h:589
#4  0x000000000040de51 in ambrosia::nectar_loader::tokenize (this=0x7fffffffe3b0, line=..., special_characters=...)
    at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:146
#5  0x000000000040dbf5 in ambrosia::nectar_loader::fetch_line (this=0x7fffffffe3b0)
    at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:112
#6  0x000000000040dd11 in ambrosia::nectar_loader::fetch_token (this=0x7fffffffe3b0, token=...)
    at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:121
#7  0x000000000040d9c4 in ambrosia::nectar_loader::next_token (this=0x7fffffffe3b0)
    at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:72
#8  0x000000000040e472 in ambrosia::nectar_loader::extract_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (this=0x7fffffffe3b0, it=...)
    at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:43
#9  0x000000000040d46d in ambrosia::drink_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (filename=..., it=...)
    at ../../ambrosia/Library/Source/Ambrosia/nectar.cpp:75
#10 0x00000000004072ae in ambrosia::reader::event (this=0x623770)

I'm at a loss, and have no clue where I'm doing something wrong. Any help is much appreciated.

EDIT: the string at the moment of the crash is

sub Ambrosia : lib libAmbrosia

UPDATE:

I replaced the above function following suggestions in comments/answers. Below is the result.

const string tokenize( const string &line, const set<char> &special_characters )
{
    const auto not_found = special_characters.end();
    const auto end = line.end();
    string result;

    if( !line.empty() )
    {
        // copy first character
        result += line[0];

        char previous = line[0];
        for( auto it = line.begin()+1; it != end; ++it )
        {
            const char current = *it;

            if( special_characters.find(previous) != not_found )
                result += ' ';

            result += current;
            previous = current;
        }
    }
    return result;
}

Answer 1

另一个猜测是line.append(" ")有时会使it无效，具体取决于线路的原始容量。

Answer 2

在第一次取消引用it之前，不要检查it != line.end() 。

Answer 3

I could not spot the error, I would suggest iterating slowly with the debugger since you have identitied the issue.

I'll just that in general, modifying what you are iterating over is extremely prone to failure.

I'd recommend using Boost Tokenizer , and more precisely: boost::token_iterator combined with boost::char_separator (code example included).

You could then simply build a new string from the first, and return the new string from the function. The speed up on computation should cover the memory allocation.

Segmentation fault in std::less<char>

Question

3 answers

solution1
6 ACCPTED 2011-04-05 16:27:42

solution2
2 2011-04-05 16:15:37

solution3
0 2011-04-05 17:16:31

Segmentation fault in std::less<char>

Question

3 answers

solution1 6 ACCPTED 2011-04-05 16:27:42

solution2 2 2011-04-05 16:15:37

solution3 0 2011-04-05 17:16:31

solution1
6 ACCPTED 2011-04-05 16:27:42

solution2
2 2011-04-05 16:15:37

solution3
0 2011-04-05 17:16:31