對字符串進行標記並在C ++中包含分隔符

Question

我對以下內容進行了解釋，但不確定如何在其中包含分隔符。

void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{

    int startpos = 0;
    int pos = str.find_first_of(delimiters, startpos);
    string strTemp;


    while (string::npos != pos || string::npos != startpos)
    {

        strTemp = str.substr(startpos, pos - startpos);
        tokens.push_back(strTemp.substr(0, strTemp.length()));

        startpos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, startpos);

    }
}

Answer 1

C ++ String Toolkit Library（StrTk）具有以下解決方案：

std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
             str,
             strtk::range_to_type_back_inserter(token_list),
             strtk::include_delimiters);

它應該導致token_list具有以下元素：

Token₀ = "abc,"
Token₁ = "123 "
Token₂ = "xyz"

更多例子可以在這里找到

Answer 2

我現在這有點草率，但這就是我最終的結果。 我不想使用boost，因為這是一項學校作業，我的導師希望我使用find_first_of來完成這項任務。

謝謝大家的幫助。

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then addt new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

Answer 3

我不能真正遵循你的代碼，你能發布一個有效的程序嗎？

無論如何，這是一個簡單的標記器，沒有測試邊緣情況：

#include <iostream>
#include <string>
#include <vector>

using namespace std;

void tokenize(vector<string>& tokens, const string& text, const string& del)
{
    string::size_type startpos = 0,
        currentpos = text.find(del, startpos);

    do
    {
        tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));

        startpos = currentpos + del.size();
        currentpos = text.find(del, startpos);
    } while(currentpos != string::npos);

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}

示例輸入，delimiter = $$ ：

Hello$$Stack$$Over$$$Flow$$$$!

令牌：

Hello$$
Stack$$
Over$$
$Flow$$
$$
!

注意：我絕不會使用我在沒有測試的情況下編寫的標記器！ 請使用boost :: tokenizer ！

Answer 4

如果分隔符是字符而不是字符串，那么你可以使用strtok 。

Answer 5

這取決於您是否需要前面的分隔符，以下分隔符或兩者，以及您希望在字符串的開頭和結尾處使用字符串，這些字符串可能在它們之前/之后沒有分隔符。

我假設你想要每個單詞，它的前面和后面的分隔符，但不是任何分隔符的字符串（例如，如果在最后一個字符串后面有一個分隔符）。

template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0;
    do { 
        int beg_word = str.find_first_not_of(delims, pos);
        if (beg_word == std::string::npos) 
            break;
        int end_word = str.find_first_of(delims, beg_word);
        int beg_next_word = str.find_first_not_of(delims, end_word);
        *out++ = std::string(str, pos, beg_next_word-pos);
        pos = end_word;
    } while (pos != std::string::npos);
}

目前，我把它寫成更像STL算法，為其輸出采用迭代器而不是假設它總是推送到集合上。 由於它在輸入中依賴（暫時）是一個字符串，因此它不使用迭代器作為輸入。

對字符串進行標記並在C ++中包含分隔符

問題描述

5 個解決方案

解決方案1
17 已采納

解決方案2
4 2009-10-03 15:50:42

解決方案3
2 2009-10-02 18:38:19

解決方案4
2 2009-10-02 20:17:16

解決方案5
0 2009-10-02 19:04:06

對字符串進行標記並在C ++中包含分隔符

問題描述

5 個解決方案

解決方案1 17 已采納

解決方案2 4 2009-10-03 15:50:42

解決方案3 2 2009-10-02 18:38:19

解決方案4 2 2009-10-02 20:17:16

解決方案5 0 2009-10-02 19:04:06

解決方案1
17 已采納

解決方案2
4 2009-10-03 15:50:42

解決方案3
2 2009-10-02 18:38:19

解決方案4
2 2009-10-02 20:17:16

解決方案5
0 2009-10-02 19:04:06