简体   繁体   English

对字符串进行标记并在C ++中包含分隔符

[英]Tokenize a string and include delimiters in C++

I'm tokening with the following, but unsure how to include the delimiters with it. 我对以下内容进行了解释,但不确定如何在其中包含分隔符。

void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{

    int startpos = 0;
    int pos = str.find_first_of(delimiters, startpos);
    string strTemp;


    while (string::npos != pos || string::npos != startpos)
    {

        strTemp = str.substr(startpos, pos - startpos);
        tokens.push_back(strTemp.substr(0, strTemp.length()));

        startpos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, startpos);

    }
}

The C++ String Toolkit Library (StrTk) has the following solution: C ++ String Toolkit Library(StrTk)具有以下解决方案:

std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
             str,
             strtk::range_to_type_back_inserter(token_list),
             strtk::include_delimiters);

It should result with token_list have the following elements: 它应该导致token_list具有以下元素:

Token0 = "abc,"
Token1 = "123 "
Token2 = "xyz"

More examples can be found Here 更多例子可以在这里找到

I now this a little sloppy, but this is what I ended up with. 我现在这有点草率,但这就是我最终的结果。 I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this. 我不想使用boost,因为这是一项学校作业,我的导师希望我使用find_first_of来完成这项任务。

Thanks for everyone's help. 谢谢大家的帮助。

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then addt new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

I can't really follow your code, could you post a working program? 我不能真正遵循你的代码,你能发布一个有效的程序吗?

Anyway, this is a simple tokenizer, without testing edge cases: 无论如何,这是一个简单的标记器,没有测试边缘情况:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

void tokenize(vector<string>& tokens, const string& text, const string& del)
{
    string::size_type startpos = 0,
        currentpos = text.find(del, startpos);

    do
    {
        tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));

        startpos = currentpos + del.size();
        currentpos = text.find(del, startpos);
    } while(currentpos != string::npos);

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}

Example input, delimiter = $$ : 示例输入,delimiter = $$

Hello$$Stack$$Over$$$Flow$$$$!

Tokens: 令牌:

Hello$$
Stack$$
Over$$
$Flow$$
$$
!

Note: I would never use a tokenizer I wrote without testing! 注意:我绝不会使用我在没有测试的情况下编写的标记器! please use boost::tokenizer ! 请使用boost :: tokenizer

如果分隔符是字符而不是字符串,那么你可以使用strtok

It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them. 这取决于您是否需要前面的分隔符,以下分隔符或两者,以及您希望在字符串的开头和结尾处使用字符串,这些字符串可能在它们之前/之后没有分隔符。

I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (eg if there's a delimiter following the last string). 我假设你想要每个单词,它的前面和后面的分隔符,但不是任何分隔符的字符串(例如,如果在最后一个字符串后面有一个分隔符)。

template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0;
    do { 
        int beg_word = str.find_first_not_of(delims, pos);
        if (beg_word == std::string::npos) 
            break;
        int end_word = str.find_first_of(delims, beg_word);
        int beg_next_word = str.find_first_not_of(delims, end_word);
        *out++ = std::string(str, pos, beg_next_word-pos);
        pos = end_word;
    } while (pos != std::string::npos);
}

For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. 目前,我把它写成更像STL算法,为其输出采用迭代器而不是假设它总是推送到集合上。 Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input. 由于它在输入中依赖(暂时)是一个字符串,因此它不使用迭代器作为输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM