如何在C中使用sscanf掃描多個單詞？

Question

我正在嘗試掃描包含C中多個單詞的行。是否有一種方法可以逐字掃描並將每個字存儲為不同的變量？

例如，我有以下幾種類型的線：

A is the 1 letter;
B is the 2 letter;
C is the 3 letter;

如果我通過第一行進行解析：“ A是1個字母”，並且我有以下代碼，那么在每種情況下應該輸入什么內容，以便獲得單個標記並將其存儲為變量。 為了闡明這一點，在代碼末尾，我希望“ is”，“ the”，“ 1”，“ letter”使用不同的變量。

我有以下代碼：

while (feof(theFile) != 1) {
    string = "A is the 1 letter"
    first_word = sscanf(string);
    switch(first_word):
      case "A":
        what to put here?
      case "B":
        what to put here?
      ...

Answer 1

你不應該這樣使用feof() 。 您應該使用fgets()或同等功能。 您可能需要使用鮮為人知（但在標准C89中存在）的轉換說明符%n 。

#include <stdio.h>

int main(void)
{
    char buffer[1024];

    while (fgets(buffer, sizeof(buffer), stdin) != 0)
    {
        char *str = buffer;
        char word[256];
        int  posn;
        while (sscanf(str, "%255s%n", word, &posn) == 1)
        {
            printf("Word: <<%s>>\n", word);
            str += posn;
        }
    }
    return(0);
}

這將讀取一行，然后迭代使用sscanf()從該行獲取單詞。 %n格式說明符不計入轉換成功，因此與1進行比較。請注意，使用%255s可以防止word溢出。 還要注意， sscanf()可以在轉換規范中指定的255個計數之后寫入null，因此，在char word[256];的聲明之間要相差1 char word[256]; 和轉換說明符%255s 。

顯然，在提取每個單詞時，由您決定如何處理； 這里的代碼只是打印出來。

與任何基於strtok()解決方案相比，該技術的一個優勢是sscanf()不會修改輸入字符串，因此，如果您需要報告錯誤，則可以在錯誤報告中使用原始輸入行。

編輯完問題后，似乎一句話都不需要像分號這樣的標點符號。 上面的代碼將標點符號作為單詞的一部分。 在這種情況下，您必須更加努力思考。 起點很可能是使用字母數字掃描集代替%255s作為轉換規范：

"%255[a-zA-Z_0-9]%n"

然后，您可能必須查看下一個組件開頭的字符，如果不是字母數字，則將其跳過：

if (!isalnum((unsigned char)*str))
{
    if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
        str += posn;
}

導致：

#include <stdio.h>
#include <ctype.h>

int main(void)
{
    char buffer[1024];

    while (fgets(buffer, sizeof(buffer), stdin) != 0)
    {
        char *str = buffer;
        char word[256];
        int  posn;
        while (sscanf(str, "%255[a-zA-Z_0-9]%n", word, &posn) == 1)
        {
            printf("Word: <<%s>>\n", word);
            str += posn;
            if (!isalnum((unsigned char)*str))
            {
                if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
                    str += posn;
            }
        }
    }
    return(0);
}

您需要考慮所選字母數字范圍的I18N和L10N方面。 可用的功能可能取決於您的實現（不幸的是，POSIX未在scanf()掃描集中指定對[[:alnum:]]等符號的支持）。

Answer 2

您可以使用strtok()標記化或拆分字符串。 請參考以下鏈接獲取示例： http : //www.cplusplus.com/reference/cstring/strtok/

您可以采用字符指針數組，並為其分配標記。

例：

char *tokens[100];
int i = 0;
char *token = strtok(string, " ");
while (token != NULL) {
    tokens[i] = token;
    token = strtok(NULL, " ");
    i++;
}

printf("Total Tokens: %d", i);

Answer 3

請注意， %s說明符會刪除空格。 所以你可以這樣寫：

    std::string s = "A is the 1 letter";
    typedef char Word[128];
    Word words[6];
    int wordsRead = sscanf(s.c_str(), "%128s%128s%128s%128s%128s%128s", words[0], words[1], words[2], words[3], words[4], words[5] );
    std::cout << wordsRead << " words read" << std::endl;
    for(int i = 0;
        i != wordsRead;
        ++i)
        std::cout << "'" << words[i] << "'" << std::endl;

請注意，這種方法（不同於strtok ）如何有效地要求對要讀取的最大單詞數及其長度進行假設。

Answer 4

我建議使用strtok() 。 這是來自http://www.cplusplus.com/reference/cstring/strtok/的示例

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

輸出將是：

Splitting string "- This, a sample string." into tokens:

This

a

sample

string

如何在C中使用sscanf掃描多個單詞？

問題描述

4 個解決方案

解決方案1
2 2012-12-07 01:10:00

解決方案2
1 2012-12-07 01:02:39

解決方案3
0 2012-12-07 01:04:36

解決方案4
0 2012-12-07 01:07:06

如何在C中使用sscanf掃描多個單詞？

問題描述

4 個解決方案

解決方案1 2 2012-12-07 01:10:00

解決方案2 1 2012-12-07 01:02:39

解決方案3 0 2012-12-07 01:04:36

解決方案4 0 2012-12-07 01:07:06

解決方案1
2 2012-12-07 01:10:00

解決方案2
1 2012-12-07 01:02:39

解決方案3
0 2012-12-07 01:04:36

解決方案4
0 2012-12-07 01:07:06