在 C 中解析文本

Question

我有一個這樣的文件：

...
words 13
more words 21
even more words 4
...

（一般格式是一串非數字，然后是空格，然后是任意數量的數字和換行符）

我想解析每一行，將單詞放入結構的一個字段，將數字放入另一個字段。 現在，我正在使用一種丑陋的技巧來閱讀該行，而字符不是數字，然后閱讀其余部分。 我相信有一個更清晰的方法。

Answer 1

編輯：您可以使用 pNum-buf 獲取字符串的字母部分的長度，並使用 strncpy() 將其復制到另一個緩沖區中。 請務必在目標緩沖區的末尾添加一個 '\0'。 我會在 pNum++ 之前插入這段代碼。

int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';

您可以將整行讀入緩沖區，然后使用：

char *pNum;
if (pNum = strrchr(buf, ' ')) {
  pNum++;
}

獲取指向數字字段的指針。

Answer 2

fscanf(file, "%s %d", word, &value);

這將值直接轉換為字符串和整數，並處理空格和數字格式等的變化。

編輯

哎呀，我忘了你的單詞之間有空格。 在這種情況下，我會執行以下操作。 （請注意，它會截斷“行”中的原始文本）

// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
    if (*p == ' ')
        lastSpace = p;
    p++;
}


if (lastSpace == null)
    return("parse error");

// Replace the last space in the line with a NUL
*lastSpace = '\0';

// Advance past the NUL to the first character of the number field
lastSpace++;

char *word = text;
int number = atoi(lastSpace);

您可以使用 stdlib 函數解決此問題，但上述方法可能更有效，因為您只搜索您感興趣的字符。

Answer 3

鑒於描述，我想我會使用這個（現已測試）C99代碼的變體：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

struct word_number
{
    char word[128];
    long number;
};

int read_word_number(FILE *fp, struct word_number *wnp)
{
    char buffer[140];
    if (fgets(buffer, sizeof(buffer), fp) == 0)
        return EOF;
    size_t len = strlen(buffer);
    if (buffer[len-1] != '\n')  // Error if line too long to fit
        return EOF;
    buffer[--len] = '\0';
    char *num = &buffer[len-1];
    while (num > buffer && !isspace((unsigned char)*num))
        num--;
    if (num == buffer)         // No space in input data
        return EOF;
    char *end;
    wnp->number = strtol(num+1, &end, 0);
    if (*end != '\0')  // Invalid number as last word on line
        return EOF;
    *num = '\0';
    if (num - buffer >= sizeof(wnp->word))  // Non-number part too long
        return EOF;
    memcpy(wnp->word, buffer, num - buffer);
    return(0);
}

int main(void)
{
    struct word_number wn;
    while (read_word_number(stdin, &wn) != EOF)
        printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
    return(0);
}

您可以通過為不同的問題返回不同的值來改進錯誤報告。 您可以使其與行的單詞部分的動態分配內存一起使用。 你可以讓它使用比我允許的更長的行。 您可以向后掃描數字而不是非空格 - 但這允許用戶編寫“abc 0x123”並且正確處理十六進制值。 您可能更願意確保單詞部分沒有數字； 這段代碼不在乎。

Answer 4

您可以嘗試使用strtok()對每一行進行標記，然后檢查每個標記是數字還是單詞（一旦有了標記字符串，就可以進行相當簡單的檢查 - 只需查看標記的第一個字符）。

Answer 5

假設數字后面緊跟着'\n'。 您可以將每一行讀取到字符緩沖區，在整行上使用 sscanf("%d") 來獲取數字，然后計算該數字在文本字符串末尾所占用的字符數。

Answer 6

根據您的字符串變得多么復雜，您可能需要使用 PCRE 庫。 至少這樣你就可以編譯一個 perl'ish 正則表達式來分割你的行。 不過，這可能有點矯枉過正。

Answer 7

鑒於描述，這就是我要做的：使用 fgets() 將每一行作為單個字符串讀取（確保目標緩沖區足夠大），然后使用 strtok() 拆分行。 要確定每個標記是單詞還是數字，我會使用 strtol() 來嘗試轉換並檢查錯誤情況。 例子：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * Read the next line from the file, splitting the tokens into 
 * multiple strings and a single integer. Assumes input lines
 * never exceed MAX_LINE_LENGTH and each individual string never
 * exceeds MAX_STR_SIZE.  Otherwise things get a little more
 * interesting.  Also assumes that the integer is the last 
 * thing on each line.  
 */
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
  char buffer[MAX_LINE_LENGTH];
  int rval = 1;
  if (fgets(buffer, buffer, sizeof buffer))
  {
    char *token = strtok(buffer, " ");
    *numStrings = 0;
    while (token) 
    {
      char *chk;
      *value = (int) strtol(token, &chk, 10);
      if (*chk != 0 && *chk != '\n')
      {
        strcpy(strs[(*numStrings)++], token);
      }
      token = strtok(NULL, " ");
    }
  }
  else
  {
    /** 
     * fgets() hit either EOF or error; either way return 0
     */
    rval = 0;
  }
  return rval;
}
/**
 * sample main
 */
int main(void)
{
  FILE *input;
  char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
  int numStrings;
  int value;

  input = fopen("datafile.txt", "r");
  if (input)
  {
    while (getNextLine(input, &strings, &numStrings, &value))
    {
      /**
       * Do something with strings and value here
       */
    }
    fclose(input);
  }
  return 0;
}

在 C 中解析文本

問題描述

7 個解決方案

解決方案1
6 2009-09-05 21:07:04

解決方案2
1 2009-09-05 21:28:15

解決方案3
1 2009-09-06 00:44:46

解決方案4
0 2009-09-05 21:06:54

解決方案5
0 2009-09-05 21:27:37

解決方案6
0 2009-09-05 21:34:35

解決方案7
0 2009-09-06 00:41:37

在 C 中解析文本

問題描述

7 個解決方案

解決方案1 6 2009-09-05 21:07:04

解決方案2 1 2009-09-05 21:28:15

解決方案3 1 2009-09-06 00:44:46

解決方案4 0 2009-09-05 21:06:54

解決方案5 0 2009-09-05 21:27:37

解決方案6 0 2009-09-05 21:34:35

解決方案7 0 2009-09-06 00:41:37

解決方案1
6 2009-09-05 21:07:04

解決方案2
1 2009-09-05 21:28:15

解決方案3
1 2009-09-06 00:44:46

解決方案4
0 2009-09-05 21:06:54

解決方案5
0 2009-09-05 21:27:37

解決方案6
0 2009-09-05 21:34:35

解決方案7
0 2009-09-06 00:41:37