使用strtok函數在C中拆分字符串

Question

我正在嘗試用{white_space}符號分割一些字符串。 順便說一句，在某些分裂中存在問題。 這意味着，我想用{white_space}符號分割，但還要引用子字符串。

例，

char *pch;
char str[] = "hello \"Stack Overflow\" good luck!";
pch = strtok(str," ");
while (pch != NULL)
{
    printf ("%s\n",pch);
    pch = strtok(NULL, " ");
}

這會給我

hello
"Stack
Overflow"
good
luck!

但是我想要的，如你所知，

hello
Stack Overflow
good
luck!

有什么建議或想法嗎？

Answer 1

您需要兩次標記化。 您當前擁有的程序流程如下：

1）搜索空間

2）在空格之前打印所有字符

3）搜索下一個空間

4）打印最后一個空格和該空格之間的所有字符。

您需要開始思考另一件事，即兩層標記化。

搜索引號
在奇數字符串上，執行原始程序（搜索空格）
在偶數字符串上，盲目打印

在這種情況下，偶數編號的字符串（理想情況下）應放在引號內。 ab“ cd” ef將導致ab為奇數，cd為偶數...等等。

另一面，是記住您需要做的事情，而您實際正在尋找的（在正則表達式中）是“ [a-zA-Z0-9 \\ t \\ n] *”或[a-zA-Z0- 9] +。 這意味着兩個選項之間的區別在於是否用引號將其分開。 因此，請用引號將其分開，並從中識別。

Answer 2

嘗試改變策略。

查看非空格的東西，然后在找到帶引號的字符串時，可以將其放在一個字符串值中。

因此，您需要一個在空白之間檢查字符的函數。 當您找到'"'您可以更改規則並將所有內容懸停在匹配的'"' 。 如果此函數返回一個TOKEN值和一個值（匹配的字符串），則調用它的對象可以決定進行正確的輸出。 然后，您編寫了標記程序，並且實際上存在一些工具來生成它們（稱為“詞法分析器”），因為它們被廣泛使用以實現編程語言/配置文件。

假設nextc從字符串中讀取下一個char，由firstc（str）開始：

for (firstc( str); ((c = nextc) != NULL;) {
    if (isspace(c))
        continue;
    else if (c == '"')
        return readQuote;       /* Handle Quoted string */
    else
        return readWord;        /* Terminated by space & '"' */
}
return EOS;

您需要定義EOS，QUOTE和WORD的返回值，以及一種在每個Quote或Word中獲取文本的方法。

Answer 3

這是在C中工作的代碼

想法是您首先標記引號，因為這是優先級（如果引號內有字符串，而不是不標記的話，我們只打印它）。 對於每個標記化的字符串，我們在空格字符上的該字符串內進行標記化，但是我們對替代字符串進行標記化，因為替代字符串將在引號內和引號外。

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

int main() {
  char *pch1, *pch2, *save_ptr1, *save_ptr2;
  char str[] = "hello \"Stack Overflow\" good luck!";
  pch1 = strtok_r(str,"\"", &save_ptr1);
  bool in = false;
  while (pch1 != NULL) {
    if(in) {
      printf ("%s\n", pch1);
      pch1 = strtok_r(NULL, "\"", &save_ptr1);
      in = false;
      continue;
    }
    pch2 = strtok_r(pch1, " ", &save_ptr2);
    while (pch2 != NULL) {
      printf ("%s\n",pch2);
      pch2 = strtok_r(NULL, " ", &save_ptr2);
    }
    pch1 = strtok_r(NULL, "\"", &save_ptr1);
    in = true;
  }
}

參考

Answer 4

它在C ++中。 我相信它可以寫得更優美，但是它是可行的並且是一個開始：

#include <iostream>
#include <stdexcept>
#include <vector>
#include <string>

using namespace std;

using Tokens = vector<string>;


Tokens split(string const & sentence) {
  Tokens tokens;
  // indexes to split on
  string::size_type from = 0, to;

  // true if we are inside quotes: we don't split by spaces and we expect a closing quote
  // false otherwise
  bool in_quotes = false;

  while (true) {
    // compute to index
    if (!in_quotes) {
      // find next space or quote
      to = sentence.find_first_of(" \"", from);
      if (to != string::npos && sentence[to] == '\"') {
        // we found an opening quote
        in_quotes = true;
      }
    } else {
      // find next quote (ignoring spaces)
      to = sentence.find('\"', from);
      if (to == string::npos) {
        // no enclosing quote found, invalid string
        throw invalid_argument("missing enclosing quotes");
      }
      in_quotes = false;
    }
    // skip empty tokens
    if (from != to) {
      // get token
      // last token
      if (to == string::npos) {
        tokens.push_back(sentence.substr(from));
        break;
      }
      tokens.push_back(sentence.substr(from, to - from));
    }
    // move from index
    from = to + 1;
  }
  return tokens;
}

測試一下：

void splitAndPrint(string const & sentence) {
  Tokens tokens;
  cout << "-------------" << endl;
  cout << sentence << endl;
  try {
    tokens = split(sentence);
  } catch (exception &e) {
    cout << e.what() << endl;
    return;
  }
  for (const auto &token : tokens) {
    cout << token << endl;
  }
  cout << endl;
}

int main() {
  splitAndPrint("hello \"Stack Overflow\" good luck!");
  splitAndPrint("hello \"Stack Overflow\" good luck from \"User Name\"");
  splitAndPrint("hello and good luck!");
  splitAndPrint("hello and \" good luck!");

  return 0;
}

輸出：

-------------
hello "Stack Overflow" good luck!
hello
Stack Overflow
good
luck!

-------------
hello "Stack Overflow" good luck from "User Name"
hello
Stack Overflow
good
luck
from
User Name

-------------
hello and good luck!
hello
and
good
luck!

-------------
hello and " good luck!
missing enclosing quotes

使用strtok函數在C中拆分字符串

問題描述

4 個解決方案

解決方案1
2 2014-06-06 14:39:40

解決方案2
1 已采納 2014-06-06 14:34:04

解決方案3
0 2014-06-06 15:02:38

解決方案4
-1 2014-06-06 15:03:49

使用strtok函數在C中拆分字符串

問題描述

4 個解決方案

解決方案1 2 2014-06-06 14:39:40

解決方案2 1 已采納 2014-06-06 14:34:04

解決方案3 0 2014-06-06 15:02:38

解決方案4 -1 2014-06-06 15:03:49

解決方案1
2 2014-06-06 14:39:40

解決方案2
1 已采納 2014-06-06 14:34:04

解決方案3
0 2014-06-06 15:02:38

解決方案4
-1 2014-06-06 15:03:49