简体   繁体   中英

How to extract numbers from string and save them into its own buffer?

I have a problem regarding a string of signs and its values. I would like to get data on each piece of the string and save it into its own buffer. For example from this string:

char str = "+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";

I want to compare characters after each sign has taken its place. First is the "+" sign that indicates a new string of data. Then I want to compare each first two or three values before the "-" sign comes up (it indicates end of section) and if it's, for example, TY before the number save the number (int or float) following to the "TYbuffer" till the "-" sign. Then after the "-" check again for which letters are first and if it is "UP" save it into "UPbuffer" and so on.

First, i go through the string like this:

size_t len = strlen(str);
size_t i;
for (i=0;i<len;i++){ //go through the string
int j=0;
  if (str[j] == '+' && str[j+1]=='T'){ //Check for the letters 
  }
}

This is an example of how I would do it but the problem is that numbers can be larger or smaller so the positions can be shifted. I am a bit confused on solving the problem. I tried with extracting the numbers from the string but then the person doesn't know from which specific section did it came from. Like this:

char tmp[20];
void loop() {
char *str = "+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";
char *p = str;
while (*p) {
      if (str == 'T'){
        while (*p) {
          if (isdigit(*p)) { // Upon finding a digit
            long val = strtol(p, &p, 10); // Read a number
            sprintf(tmp, "%1d", val);
            Serial.println(val); // and print it
          } else { 
             p++;
          }
        }

For the reference, I'm using an Arduino platform.

I've managed to extract the int/float values from the string with a bit of an Error for which I have to take care of. I got rid of the "+" sign at the beginning. Here is the further code:

#include <stdlib.h>

void setup() {
  Serial.begin(9600);
  char str[] = "TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";
  Serial.write(str);
  analyzeString(str);  
}

void loop() {

}

void analyzeString(char* str) {

  int count = 0;                                  //start the count
  char buff[20];                                  //initialiye the buffer for 20 char

    for(int i = 0; i<strlen(str); i++) {          //start the loop for reading the characters from whole string
      if(str[i] == '-') {                         //start the loop when "-" occurs
        char bufNum[20];                          //buffer for the number in a seperate section
        //char bufNum1[20];
        if (buff[0] == 'T' && buff[1] == 'Y') {   //If the first letter in a buffer is U and second is P 
          for(int j = 2; j<count; j++) {           
            bufNum[j-2] = buff[j];
          }
          bufNum[count-2] = '\0';
          Serial.println();
          int ty = atoi(bufNum);                 //atof(string) for float, atoi(string) for int
          Serial.print(ty);                      //float constant

        } else if (buff[0] == 'U' && buff[1] == 'P'){
          for(int j = 2; j<count; j++) {
            bufNum[j-2] = buff[j];
          }
          bufNum[count-2] = '\0';
          Serial.println(bufNum);
          float up = atof(bufNum);
          Serial.print(up);
        } else if (buff[0] == 'F' && buff[1] == 'A'){
          for(int j = 2; j<count; j++) {
            bufNum[j-2] = buff[j];
          }
          bufNum[count-2] = '\0';
          Serial.println(bufNum);
          int fa = atoi(bufNum);
          Serial.print(fa);
        }


        count =0;

      } else {
          buff[count++] = str[i];
      }
    }
}

The error is in the output because it outputs the next values of the sections like this:

TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-
12365.4
65.40545
545

I'm looking for a guide on how should I approach the problem. I would appreciate any help. Thank you.

There are a number of ways to approach parsing your sections from the string and then parsing the labels and values from each section. First and foremost, there is absolutely nothing wrong with simply walking a pointer down the string, checking for '+', '-', 'letter', 'digit' as you go and taking the appropriate action.

However, C also provides a couple of handy tools in strtok and strtod that can automate the section parsing, and validate the numeric conversions for you that take some of the tedium out of it. You can also simply choose to store the numeric values as double (or float using strtof instead of strtod ) and then handle output with %g that will only output the fractional part if it is present.

Fox example, you could do something similar to:

#include <stdio.h>
#include <stdlib.h>     /* for strtod  */
#include <string.h>     /* for strlen  */
#include <ctype.h>      /* for isalpha */
#include <errno.h>      /* for errno   */

#define MAXSECT 16      /* if you need a constant, define one */

typedef struct {        /* struct to hold label & value */
    char str[MAXSECT];
    double d;
} section;

int main (void) {

    int n = 0;              /* number of sections */
    char buf[] = "+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-",
        *p = buf,           /* pointer to buf */
        *delim = "+-";      /* delimiters for strtok */
    section sect[MAXSECT] = {{ .str = "" }};    /* array of MAXSECT sections */

    /* tokenize buf splitting on '+' and '-' */
    for (p = strtok(p, delim); p; p = strtok (NULL, delim)) {
        size_t len = strlen (p), idx = 0;   /* length and label index */
        char *ep = p;       /* 'endptr' for strtod */

        if (len + 1 > MAXSECT) {    /* check length of section fits */
            fprintf (stderr, "error: section too long '%zu' chars '%s'.\n",
                    len, p);
            continue;
        }

        while (isalpha (*p))    /* while letters, copy to sect[n].str */
            sect[n].str[idx++] = *p++;
        sect[n].str[idx++] = 0; /* nul-terminate sect[n].str */

        errno = 0;
        sect[n].d = strtod (p, &ep);    /* convert value, store as double */
        if (p != ep)                    /* validate digits converted */
            if (errno) {                /* validate no error on converstion */
                perror ("conversion to number failed");
                continue;
            }
        if (++n == MAXSECT) {           /* increment n, check if array full */
            fprintf (stderr, "sect array full.\n");
            break;
        }
    }

    for (int i = 0; i < n; i++) /* output results, fraction only if present */
        printf ("buf[%3s] : %g\n", sect[i].str, sect[i].d);

    return 0;
}

( note: if compiling on an old C89 compiler (eg Win7/VS 10), you can initialize section sect[MAXSECT] = {{0},0}; as C89 did not provide for named initializers. You will also need to declare the counter variable i at the top along with n )

Example Use/Output

$ ./bin/strtok_str_sect
buf[ TY] : 123
buf[ UP] : 65.4
buf[ FA] : 545
buf[MTE] : 565
buf[MTD] : 65
buf[MTT] : 230
buf[MPE] : 545
buf[MPD] : 656
buf[MPT] : 345

Look over all of the answers, there is good learning to be had in all. Let me know if you have further questions.

I think it's a little bit overwhelmed solution, and I wrote it pretty fast just to give you an idea.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <unistd.h>

struct data {
    char id[20];
    int i_value;
    float f_value;
};


static int get_ar_size(const char *str)
{
    int count = 0;

    while (*str != '\0') {
        if (*str == '+' || *str == '-')
            count++;
        *str++;
    }
    return (count);
}

static int is_float_string(const char **tmp)
{
    int is_float = 1;
    int is_int = 0;
    for(; *(*tmp) != '\0' && *(*tmp) != '+' && *(*tmp) != '-'; *(*tmp)++) {
        if (*(*tmp) == '.')
            return (is_float);
    }

    return (is_int);
}

static void get_info_from_string(const char *str, int i,
                                 struct data strct_arr[])
{
    int j = 0;
    const char *tmp = NULL;

    /*write two letter ID into id array*/
    while (*str != '\0' && *str != '-' && *str != '+' && isalpha(*str)) {
            strct_arr[i].id[j++] = *str++;
    }

    tmp = str;
    /* then write value for that letter ID */
    while (*tmp != '\0' && *tmp != '-' && *tmp != '+' && isdigit(*tmp)) {
        /* check it is float or it is integer */
        if(is_float_string(&tmp)) {
            strct_arr[i].f_value = atof(&(*str));
            break;
        }
        else {
            strct_arr[i].i_value = atoi(&(*str));
            break;
        }
        *tmp++;
    }
}


int main(void)
{
    const char *str = "+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";
    int size = 0;
    int index = 0;

    /*count every '+' and '-' it would be our size for struct array*/
    size = get_ar_size(str);

    /* create array of structure which has first letter buf id and its value */
    struct data *strct_arr = malloc(sizeof(struct data) * size + 1);
    if (strct_arr == NULL) {
        perror("Malloc failed: ");
        return EXIT_FAILURE;
    }

    bzero(strct_arr, sizeof(strct_arr));
    for (index = 0; *str != '\0'; *str++) {
        if ((*str == '+' || *str == '-') && (isalpha(*(str+1)))) {
            *str++;
            get_info_from_string(&(*str), index, strct_arr);
            index++;
        }
    }

    index = 0;
    while(index < size) {
        if (strct_arr[index].i_value == 0) {
            printf("ID [%s] float %.1f\n", strct_arr[index].id, strct_arr[index].f_value);
        }
        else
            printf("ID [%s] int %d\n", strct_arr[index].id, strct_arr[iindex].i_value);
        index++;
    }
 return 0;
}

It almost no error checking, you should be careful. Just think what you could do rewrite from that. Output that i take from your string:

"+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";

ID [TY] int 123
ID [UP] float 65.4
ID [FA] int 545
ID [MTE] int 565
ID [MTD] int 65
ID [MTT] int 230
ID [MPE] int 545
ID [MPD] int 656
ID [MPT] int 345

This is a method using strtok . I think strtok is not available in Arduino, you can use strtok_r instead.

int main()
{
    char str[] = "+TY123-UP65.4-FA545-MTE565-MTD65-MTT230-MPE545-MPD656-MPT345-";
    //char *context;
    //char *token = strtok_r(str, "+-", &context);
    char *token = strtok(str, "+-");
    while(token)
    {
        for(size_t i = 0, len = strlen(token); i < len; i++)
        {
            if(!isdigit(token[i])) continue;

            //text part, example TY
            char str_name[10];
            strncpy(str_name, token, i);
            str_name[i] = 0;

            //integer part, example 123
            char *temp = token + i;

            if(strstr(temp, "."))
                printf("%s %.2f\n", str_name, strtof(temp, &temp));
            else
                printf("%s %d\n", str_name, strtol(temp, &temp, 10));

            break;
        }
        //token = strtok_r(NULL, "+-", &context);
        token = strtok(NULL, "+-");
    }
    return 0;
}

Run on ideone

It seems to me that what you are building is a lexer. You might find some good information at this question: lexers vs parsers

One way to approach this is to define some token types. Each type has a set of characters to match and a list of allowed following types. This can be in data structures or hard coded, but you should write it out to clear your thinking.

  1. Start off expecting whatever your valid first types are.
  2. Grab a character
  3. Figure out what token type it fits by going through the list of valid types. Start a new token of that type.
  4. Grab another character.
  5. If the character fits into the current token, you add it to the string. That can be a copy of the original string, or you can have a pointer to the symbol start and a count of characters. Repeat from 4.
  6. If it doesn't fit then you output the token. That output could be a return value, an "out" parameter, a function callback or adding it to a data structure like a list or array. Repeat from 2. Or if you returned to a caller wait to be called again.
  7. Eventually you run out of data or hit an error and exit.

Each token is a data struct of its own. It should include the position that the token was found at and the type of the thing the token is. And of course the token string.

Your token types look like [+-] followed by [AZ] followed by [0-9.], and back to the beginning followed by [+-].

You could replace steps 4 and 5 with the strspn function.

After this the code receiving the tokens should have an easier time since it will not have the low level details of reading each character mixed into it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM