简体   繁体   English

读取以逗号分隔的一串数字

[英]Reading a string of numbers separated by commas

I'm writing a function that is supposed to read a string of numbers, separated by commas.我正在编写一个函数,它应该读取一串数字,用逗号分隔。 The format of the string is as following:字符串的格式如下:

"1, 2, 3"

The only "rule" is that the function will tolerate any spaces or tabs, as long as there's one comma between each number.唯一的“规则”是该函数将容忍任何空格或制表符,只要每个数字之间有一个逗号。

If the string is valid, the numbers are to be stored in a linked list.如果字符串有效,则将数字存储在链表中。

for instance, the following strings are valid :例如,以下字符串是有效的:

"1,2,14,2,80"
"  250  ,  1,  88"

but the following are not valid :但以下内容无效

" 5, 1, 3 ,"
"51, 60, 5,,9"

I first tried my luck with strtok() (using delimiters ", \t", but from what i understand at the moment, it's impossible to check for errors. So I wrote my own function, but I'm extremely unhappy with it - I think the code is pretty bad, and although it seems to work, I'd really like to know if there is a cleaner, easier way to implement such a function.我首先用 strtok() 试试运气(使用分隔符“,\t”,但据我目前的理解,不可能检查错误。所以我编写了自己的函数,但我对它非常不满意 -我认为代码很糟糕,虽然它似乎可以工作,但我真的很想知道是否有一种更清洁、更简单的方法来实现这样的功能。

My function is:我的功能是:

void sliceNumbers(char * string)
{
  /*flag which marks if we're expecting a comma or not*/
  int comma = FALSE;
  /*Are we inside a number?*/
  int nFlag = TRUE;
  /*error flag*/
  int error = FALSE;
  /*pointer to string start*/
  char * pStart = string;
  /*pointer to string end*/
  char * pEnd = pStart;

  /*if received string is null*/
  if (!string)
  {
    /*add error and exit function*/
    printf("You must specify numbers");
    return;
  }
  /*this loop checks if all characters in the string are legal*/
  while (*pStart != '\0')
  {
    if ((isdigit(*pStart)) || (*pStart == ',') || (*pStart == ' ') || (*pStart == '\t'))
    {
      pStart++;
    }
    else
    {
      char tmp[2];
      tmp[0] = *pStart;
      tmp[1] = 0;
      printf("Invalid character");
      error = TRUE;
      pStart++;
    }
  }
  if (!error)
  {
    pStart = string;
    if (*pStart == ',')
    {
    printf("Cannot start data list with a comma");
    return;
    }
    pEnd = pStart;
    while (*pEnd != '\0')
    {
      if (comma)
      {
        if (*pEnd == ',')
        {
          if (!nFlag)
          {

          }
          if (*(pEnd + 1) == '\0')
          {
            printf("Too many commas");
            return;
          }
          *pEnd = '\0';
          /*Add the number to the linked list*/
          addNumber(pStart, line, DC);
          comma = FALSE;
          nFlag = FALSE;
          pStart = pEnd;
          pStart++;
          pEnd = pStart;
        }
        else if (isdigit(*pEnd))
        {
          if (!nFlag)
          {
            printf("numbers must be seperated by commas");
            pEnd++;
          }
          else
          {
            if (*(pEnd + 1) == '\0')
            {
              pEnd++;
              /*Add the number to the linked list*/
              addNumber(pStart);
              comma = FALSE;
              nFlag = FALSE;
              pStart = pEnd;
              pStart++;
              pEnd = pStart;
            }
            else
            {
              pEnd++;
            }
          }
        }
        else if (*pEnd == '\0')
        {
          if (nFlag)
          {
            /*Add the number to the linked list*/
            addNumber(pStart, line, DC);
          }
          else
          {
            printf("Too many commas");
          }

        }
        else if (*pEnd == ' ' || *pEnd == '\t')
        {
          nFlag = FALSE;
          pEnd++;
        }
      }
      else
      {
        if (*pEnd == ',')
        {
          printf("There must be only 1 comma between numbers");
          return;

        }
        else if (isdigit(*pEnd))
        {
          if (*(pEnd + 1) == '\0')
          {
            pEnd++;
            /*Add the number to the linked list*/
            addnumber(pStart, line, DC);
            comma = FALSE;
            nFlag = FALSE;
            pStart = pEnd;
            pStart++;
            pEnd = pStart;
          }
          else
          {
            pStart = pEnd;
            pEnd++;
            nFlag = TRUE;
            comma = TRUE;
          }
        }
        else if (*pEnd == ' ' || *pEnd == '\t')
        {
          if (!nFlag)
          {
            pEnd++;
          }
          else
          {
            pEnd++;
          }
        }
      }
    }
  }
}

You have defined a number of booleans (although you've declared them as int s) that keep track of the current state.您已经定义了许多布尔值(尽管您已将它们声明为int s)来跟踪当前状态。 You could combine these into one state variable, using #define to define possible values:您可以将它们组合成一个state变量,使用#define定义可能的值:

#define STATE_START 0
#define STATE_IN_NUMBER 1
#define STATE_COMMA 2
#define STATE_FINISHED 3
#define STATE_ERROR 4

int state = STATE_START;

You can draw a diagram (a bit like a flow chart) showing how each character moves us from one state to another.你可以画一张图(有点像流程图),展示每个角色如何将我们从一种状态转移到另一种状态。

在此处输入图像描述

(For my image I've kept things simple and only shown the non-error states for input with no spaces) (对于我的图像,我保持简单,只显示输入的非错误状态,没有空格)

Or just put it in words:或者简单地说:

current state   | input     | next state| side effect
-----------------------------------------------------------------------
START           | digit     | IN_NUMBER | start storing a number
START           | other     | ERROR     | 
IN_NUMBER       | digit     | IN_NUMBER | continue storing a number
IN_NUMBER       | comma     | COMMA     | complete storing a number
IN_NUMBER       | null      | FINISHED  | finalise output
IN_NUMBER       | other     | ERROR     | report error
COMMA           | digit     | IN_NUMBER | start storing a number
COMMA           | comma     | ERROR     |
COMMA           | other     | ERROR     |

(For my table I've added basic error states, but still haven't accounted for whitespace) (对于我的表,我添加了基本的错误状态,但仍然没有考虑空格)

You will need to add some more states and transitions to deal with spaces and tabs, but the principle doesn't change.您将需要添加更多状态和转换来处理空格和制表符,但原则不会改变。 I'd suggest starting with an implementation that works without spaces, then adding to it.我建议从一个没有空格的实现开始,然后添加它。

This allows you to write a finite state machine , one implementation of which looks like this:这允许您编写一个有限状态机,它的一个实现如下所示:

int state = STATE_START;
while(state != STATE_FINISHED && state != STATE_ERROR) {
    char c = input[offset++];
    switch(state) {
        case STATE_START:
            state = handleStateStart(...);
            break;
        case STATE_IN_NUMBER:
            state = handleInNumber(...);
            break;
        // etc.
        default:
            sprintf(error_message, "Reached unsupported state: %c", state);
            state = STATE_ERROR;
    }
}

The parameters to the handling functions need to pass in the data structures it will be reading and modifying.处理函数的参数需要传入它将读取和修改的数据结构。 For example:例如:

int handleStateStart(
    char c,
    int* current_number,
    char *error_message) 
{
    if( ! isDigit(c)) {
        sprintf(error_message, "Expected a digit at char %d", *offset);
        return STATE_ERROR;
    }
    *current_number = atoi(c);
    return STATE_IN_NUMBER;
}

(This is an easy-to-understand way of implementing a state machine, but there are other ways to do it: Is there a typical state machine implementation pattern? ) (这是一种易于理解的状态机实现方式,但还有其他方式可以做到: 有没有典型的状态机实现模式?

Your CSV parsing problem lends itself very well to a state machine, and the resulting code will be very neat and clean.您的 CSV 解析问题非常适合状态机,生成的代码将非常整洁。 State machines are used for more complex parsing tasks, and are heavily used in things like compilers.状态机用于更复杂的解析任务,并大量用于编译器等。 Later in your study you will encounter regular expressions -- formally, regular expressions are a compact way of expressing finite state machines that consume characters.稍后在您的学习中,您将遇到正则表达式——正式地,正则表达式是一种表达消耗字符的有限状态机的紧凑方式。

strtok() is the right way to do that. strtok()是正确的方法。 But pass "," (comma) only as delimiter.但仅通过"," (逗号)作为分隔符。 You can check the resulting string for being zero-length ( strlen(tok)==0 ), which means that you have two consecutive ',' .您可以检查结果字符串的长度为零( strlen(tok)==0 ),这意味着您有两个连续',' After checking this you simply trim the results, ie as described here .检查后,您只需修剪结果,即如此所述。

You can use regex lib您可以使用正则表达式库

1) Validate string 1) 验证字符串

[^\d, ]|,[[:blank:]]+,|,{2,}

where在哪里
[^\d, ] - found all symbols except digits, commas and white spaces [^\d, ] - 找到除数字、逗号和空格之外的所有符号
,[[:blank:]]+,|,{2,} - Validate string 2 or more commas with spaces and tabs without digits between commas ,[[:blank:]]+,|,{2,} - 验证字符串 2 个或多个带有空格和制表符的逗号,逗号之间没有数字

2) Process numbers 2) 进程号

\d+

You can try it online here你可以在这里在线尝试

A quite efficient straight forward approach:一种非常有效的直接方法:

  1. Remove all the spaces and tabs at one pass.一次删除所有空格和制表符。 You can avoid the space overhead by doing it in place.您可以通过就地执行来避免空间开销。
  2. Read the stream of numbers and keep adding them to a linked list.阅读数字流并继续将它们添加到链接列表中。 If any invalid number, for example, of length 0, would be detected, simply return a NULL pointer and thus stop processing further.如果检测到任何无效数字,例如长度为 0 的数字,则只需返回一个 NULL 指针,从而停止进一步处理。
  3. If pass 2 successfully completed, return the head pointer of that linked list.如果 pass 2 成功完成,则返回该链表的头指针。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM