简体   繁体   中英

Reading a string of numbers separated by commas

I'm writing a function that is supposed to read a string of numbers, separated by commas. The format of the string is as following:

"1, 2, 3"

The only "rule" is that the function will tolerate any spaces or tabs, as long as there's one comma between each number.

If the string is valid, the numbers are to be stored in a linked list.

for instance, the following strings are valid :

"1,2,14,2,80"
"  250  ,  1,  88"

but the following are not valid :

" 5, 1, 3 ,"
"51, 60, 5,,9"

I first tried my luck with strtok() (using delimiters ", \t", but from what i understand at the moment, it's impossible to check for errors. So I wrote my own function, but I'm extremely unhappy with it - I think the code is pretty bad, and although it seems to work, I'd really like to know if there is a cleaner, easier way to implement such a function.

My function is:

void sliceNumbers(char * string)
{
  /*flag which marks if we're expecting a comma or not*/
  int comma = FALSE;
  /*Are we inside a number?*/
  int nFlag = TRUE;
  /*error flag*/
  int error = FALSE;
  /*pointer to string start*/
  char * pStart = string;
  /*pointer to string end*/
  char * pEnd = pStart;

  /*if received string is null*/
  if (!string)
  {
    /*add error and exit function*/
    printf("You must specify numbers");
    return;
  }
  /*this loop checks if all characters in the string are legal*/
  while (*pStart != '\0')
  {
    if ((isdigit(*pStart)) || (*pStart == ',') || (*pStart == ' ') || (*pStart == '\t'))
    {
      pStart++;
    }
    else
    {
      char tmp[2];
      tmp[0] = *pStart;
      tmp[1] = 0;
      printf("Invalid character");
      error = TRUE;
      pStart++;
    }
  }
  if (!error)
  {
    pStart = string;
    if (*pStart == ',')
    {
    printf("Cannot start data list with a comma");
    return;
    }
    pEnd = pStart;
    while (*pEnd != '\0')
    {
      if (comma)
      {
        if (*pEnd == ',')
        {
          if (!nFlag)
          {

          }
          if (*(pEnd + 1) == '\0')
          {
            printf("Too many commas");
            return;
          }
          *pEnd = '\0';
          /*Add the number to the linked list*/
          addNumber(pStart, line, DC);
          comma = FALSE;
          nFlag = FALSE;
          pStart = pEnd;
          pStart++;
          pEnd = pStart;
        }
        else if (isdigit(*pEnd))
        {
          if (!nFlag)
          {
            printf("numbers must be seperated by commas");
            pEnd++;
          }
          else
          {
            if (*(pEnd + 1) == '\0')
            {
              pEnd++;
              /*Add the number to the linked list*/
              addNumber(pStart);
              comma = FALSE;
              nFlag = FALSE;
              pStart = pEnd;
              pStart++;
              pEnd = pStart;
            }
            else
            {
              pEnd++;
            }
          }
        }
        else if (*pEnd == '\0')
        {
          if (nFlag)
          {
            /*Add the number to the linked list*/
            addNumber(pStart, line, DC);
          }
          else
          {
            printf("Too many commas");
          }

        }
        else if (*pEnd == ' ' || *pEnd == '\t')
        {
          nFlag = FALSE;
          pEnd++;
        }
      }
      else
      {
        if (*pEnd == ',')
        {
          printf("There must be only 1 comma between numbers");
          return;

        }
        else if (isdigit(*pEnd))
        {
          if (*(pEnd + 1) == '\0')
          {
            pEnd++;
            /*Add the number to the linked list*/
            addnumber(pStart, line, DC);
            comma = FALSE;
            nFlag = FALSE;
            pStart = pEnd;
            pStart++;
            pEnd = pStart;
          }
          else
          {
            pStart = pEnd;
            pEnd++;
            nFlag = TRUE;
            comma = TRUE;
          }
        }
        else if (*pEnd == ' ' || *pEnd == '\t')
        {
          if (!nFlag)
          {
            pEnd++;
          }
          else
          {
            pEnd++;
          }
        }
      }
    }
  }
}

You have defined a number of booleans (although you've declared them as int s) that keep track of the current state. You could combine these into one state variable, using #define to define possible values:

#define STATE_START 0
#define STATE_IN_NUMBER 1
#define STATE_COMMA 2
#define STATE_FINISHED 3
#define STATE_ERROR 4

int state = STATE_START;

You can draw a diagram (a bit like a flow chart) showing how each character moves us from one state to another.

在此处输入图像描述

(For my image I've kept things simple and only shown the non-error states for input with no spaces)

Or just put it in words:

current state   | input     | next state| side effect
-----------------------------------------------------------------------
START           | digit     | IN_NUMBER | start storing a number
START           | other     | ERROR     | 
IN_NUMBER       | digit     | IN_NUMBER | continue storing a number
IN_NUMBER       | comma     | COMMA     | complete storing a number
IN_NUMBER       | null      | FINISHED  | finalise output
IN_NUMBER       | other     | ERROR     | report error
COMMA           | digit     | IN_NUMBER | start storing a number
COMMA           | comma     | ERROR     |
COMMA           | other     | ERROR     |

(For my table I've added basic error states, but still haven't accounted for whitespace)

You will need to add some more states and transitions to deal with spaces and tabs, but the principle doesn't change. I'd suggest starting with an implementation that works without spaces, then adding to it.

This allows you to write a finite state machine , one implementation of which looks like this:

int state = STATE_START;
while(state != STATE_FINISHED && state != STATE_ERROR) {
    char c = input[offset++];
    switch(state) {
        case STATE_START:
            state = handleStateStart(...);
            break;
        case STATE_IN_NUMBER:
            state = handleInNumber(...);
            break;
        // etc.
        default:
            sprintf(error_message, "Reached unsupported state: %c", state);
            state = STATE_ERROR;
    }
}

The parameters to the handling functions need to pass in the data structures it will be reading and modifying. For example:

int handleStateStart(
    char c,
    int* current_number,
    char *error_message) 
{
    if( ! isDigit(c)) {
        sprintf(error_message, "Expected a digit at char %d", *offset);
        return STATE_ERROR;
    }
    *current_number = atoi(c);
    return STATE_IN_NUMBER;
}

(This is an easy-to-understand way of implementing a state machine, but there are other ways to do it: Is there a typical state machine implementation pattern? )

Your CSV parsing problem lends itself very well to a state machine, and the resulting code will be very neat and clean. State machines are used for more complex parsing tasks, and are heavily used in things like compilers. Later in your study you will encounter regular expressions -- formally, regular expressions are a compact way of expressing finite state machines that consume characters.

strtok() is the right way to do that. But pass "," (comma) only as delimiter. You can check the resulting string for being zero-length ( strlen(tok)==0 ), which means that you have two consecutive ',' . After checking this you simply trim the results, ie as described here .

You can use regex lib

1) Validate string

[^\d, ]|,[[:blank:]]+,|,{2,}

where
[^\d, ] - found all symbols except digits, commas and white spaces
,[[:blank:]]+,|,{2,} - Validate string 2 or more commas with spaces and tabs without digits between commas

2) Process numbers

\d+

You can try it online here

A quite efficient straight forward approach:

  1. Remove all the spaces and tabs at one pass. You can avoid the space overhead by doing it in place.
  2. Read the stream of numbers and keep adding them to a linked list. If any invalid number, for example, of length 0, would be detected, simply return a NULL pointer and thus stop processing further.
  3. If pass 2 successfully completed, return the head pointer of that linked list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM