具有多个分隔符的字符串标记符，包括没有Boost的分隔符

Question

我需要在C ++中创建字符串解析器。 我试过用

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then add new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

适用于2X + 7cos（3Y）

（ tokenizer("2X+7cos(3Y)","+-/^() \\t"); ）

但是给出了2X的运行时错误

我需要非Boost解决方案。

我尝试使用C ++ String Toolkit（StrTk）Tokenizer

std::vector<std::string> results;
strtk::split(delimiter, source,
             strtk::range_to_type_back_inserter(results),
             strtk::tokenize_options::include_all_delimiters);

 return results;

但它不会将令牌作为单独的字符串。

例如：如果我将输入设为2X + 3Y

输出向量包含

2X +

3Y

Answer 1

可能发生的是当传递npos时崩溃：

lastPos = str.find_first_not_of(delimiters, pos);

只需在循环中添加中断，而不是依赖于while子句来突破它。

if (pos == string::npos)
  break;
lastPos = str.find_first_not_of(delimiters, pos);

if (lastPos == string::npos)
  break;
pos = str.find_first_of(delimiters, lastPos);

Answer 2

循环退出条件被破坏：

while (string::npos != pos || string::npos != startpos)

允许输入，例如pos = npos和startpos = 1。

所以

strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)

结束不是npos，所以substr不会停在应该和BOOM的地方！

如果pos = npos和startpos = 0，

strOne.substr(startpos, pos - startpos)

生活，但是

strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"

死亡。 那样做

strOne.substr(pos, 1) != " "

可悲的是，我已经没时间了，现在也无法解决这个问题，但是QuestionC有了正确的想法。 更好的过滤。 有点像：

    if (string::npos != pos)
    {
        if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
            vS.push_back("\\n");
        // else if the delimiter is not a space
        else if (strOne[pos] != ' ')
            vS.push_back(strOne.substr(pos, 1));
    }

Answer 3

我创建了一个将字符串拆分为子字符串（存储在向量中）的小函数，它允许您设置要将哪些字符视为空格。 普通空格仍将被视为空格，因此您无需定义该空格。 实际上，它所做的只是将你定义为空白的字符转换为实际的空格（空格char''）。 然后它在流（stringstream）中运行它以分离子串并将它们存储在向量中。 这可能不是您对此特定问题所需要的，但也许它可以为您提供一些想法。

// split a string into its whitespace-separated substrings and store
// each substring in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
    string temp{ s };
    // go through each char in temp (or s)
    for (char& ch : temp) {     
        // check if any characters in temp (s) are whitespace defined in w
        for (char white : w) {  
            if (ch == white)
                ch = ' ';       // if so, replace them with a space char (' ')
        }
    }

    vector<string> substrings;
    stringstream ss{ temp };

    for (string buffer; ss >> buffer;) {
        substrings.push_back(buffer);
    }
    return substrings;
}

Answer 4

如果您可以分享有关您的环境的一些信息，那将会很棒。 使用g ++，我的Fedora 20上的输入值为2X，你的程序运行正常。

具有多个分隔符的字符串标记符，包括没有Boost的分隔符

问题描述

4 个解决方案

解决方案1
2 2015-07-01 05:07:54

解决方案2
1 已采纳 2015-07-01 06:19:13

解决方案3
0 2015-07-01 05:17:53

解决方案4
0 2015-07-01 05:58:23

具有多个分隔符的字符串标记符，包括没有Boost的分隔符

问题描述

4 个解决方案

解决方案1 2 2015-07-01 05:07:54

解决方案2 1 已采纳 2015-07-01 06:19:13

解决方案3 0 2015-07-01 05:17:53

解决方案4 0 2015-07-01 05:58:23

解决方案1
2 2015-07-01 05:07:54

解决方案2
1 已采纳 2015-07-01 06:19:13

解决方案3
0 2015-07-01 05:17:53

解决方案4
0 2015-07-01 05:58:23