简体   繁体   中英

C++ How to split stringstream using mulitple delimiters

How would I go about splitting up a stringstream into individual strings using multiple delimiters? Right now it uses the default white space delimiter and I manually delete the first and last characters if they are anything other then alphanumeric.

The goal here is to read in a .cpp file and parse it for all the user idents that are not reserved words in C++.

It's working for benign examples but for stuff like this:

OrderedPair<map_iterator, bool> insert(const value_type& kvpair)

It is not working. I'd like to be able to split OrderedPair into it's own word, map_iterator into it's own, bool, insert, const, value_type, and kvpair all into individual words.

How would I go about using "< > , ( & ) . -> *" as delimiters for my stringstream?

 while (getline(inFile, line)) {
    isComment = false;
    stringstream sstream(line);
    while (sstream >> word) {
        isCharLiteral = false;

        if (!isComment) {
            if (word[0] == '/' && word[1] == '/')
                isComment = true;
        }

        if (!isMultilineComment) {
            if (word[0] == '/' && word[1] == '*')
                isMultilineComment = true;
        }

        if (!isStringLiteral) {
            if (word[0] == '"')
                isStringLiteral = true;
        }

        if (!isCharLiteral) {
            if (word[0] == '\'' && word.back() == '\'')
                isCharLiteral = true;
        }

        if (isStringLiteral)
            if (word.back() == '"')
                isStringLiteral = false;

        if (isMultilineComment)
            if (word[0] == '*' && word[1] == '/')
                isMultilineComment = false;

        if (!isStringLiteral && !isMultilineComment && !isComment && !isCharLiteral) {

If you are able to use standard libraries, then I would suggest using std::strtok() to tokenize your string. You can pass any delimiters you like to strtok() . There is a reference for it here .

Since you are using a string datatype, for strtok to work properly, you'd have to copy your string into a null-terminated character array of sufficient length, and then call strtok() on that array.

C++ std::istream only provides basic input methods for the most common use cases. Here you can directly process the std::string with the methods find_first_of and find_last_of to identify either delimiters or non delimiters. It is easy to build something near the good old strtok but acting directly on a std::string instead of writing directly \\0 in the parsed string.

But for you are trying to achieve, you should take into accounts comments, string litteral, macros and pragmas that you should not search for indentifiers

You could use a regex to replace instances of the characters you want to be delimiters with whitespace. Then use your existing white space splitting setup. http://en.cppreference.com/w/cpp/regex

Or get extra fancy with the regex and just match on the things you do want, and iterate through the matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM