簡體   English   中英

如何用大於一個字符的定界符分割字符串?

[英]How to split a string with a delimiter larger than one single char?

假設我有這個:

"foo bar 1 and foo bar 2"

如何將其拆分為:

foo bar 1
foo bar 2

我嘗試了strtok()strsep()但是都沒有用。 他們不將“和”識別為定界符,而是將“ a”,“ n”和“ d”識別為定界符。

有什么功能可以幫助我,或者我必須按空格分開並進行一些字符串操作?

您可以使用strstr()查找第一個“和”,然后自己跳過字符串,然后再次進行查找,從而自己對字符串進行“標記”。

在C中拆分字符串的主要問題是,它不可避免地導致了一些動態內存管理,並且在可能的情況下,標准庫往往會避免這種情況。 這就是為什么沒有標准C函數處理動態內存分配的原因,只有malloc / calloc / realloc可以做到這一點。

但是自己做這件事並不難。 讓我一窺。

我們需要返回多個字符串,最簡單的方法是返回一個指向字符串的指針數組,該數組以NULL項終止。 除了最后一個NULL外,數組中的每個元素都指向動態分配的字符串。

首先,我們需要幾個輔助函數來處理這樣的數組。 最簡單的一種是計算字符串數(最后一個NULL之前的元素):

/* Return length of a NULL-delimited array of strings. */
size_t str_array_len(char **array)
{
    size_t len;

    for (len = 0; array[len] != NULL; ++len)
        continue;
    return len;
}

另一個簡單的函數是釋放數組的函數:

/* Free a dynamic array of dynamic strings. */
void str_array_free(char **array)
{
    if (array == NULL)
        return;
    for (size_t i = 0; array[i] != NULL; ++i)
        free(array[i]);
    free(array);
}

將字符串的副本添加到數組的函數更加復雜。 它需要處理許多特殊情況,例如當數組尚不存在時(整個數組為NULL)。 另外,它需要處理未以'\\ 0'結尾的字符串,以便我們的實際拆分功能更容易在追加時僅使用輸入字符串的一部分。

/* Append an item to a dynamically allocated array of strings. On failure,
   return NULL, in which case the original array is intact. The item
   string is dynamically copied. If the array is NULL, allocate a new
   array. Otherwise, extend the array. Make sure the array is always
   NULL-terminated. Input string might not be '\0'-terminated. */
char **str_array_append(char **array, size_t nitems, const char *item, 
                        size_t itemlen)
{
    /* Make a dynamic copy of the item. */
    char *copy;
    if (item == NULL)
        copy = NULL;
    else {
        copy = malloc(itemlen + 1);
        if (copy == NULL)
            return NULL;
        memcpy(copy, item, itemlen);
        copy[itemlen] = '\0';
    }

    /* Extend array with one element. Except extend it by two elements, 
       in case it did not yet exist. This might mean it is a teeny bit
       too big, but we don't care. */
    array = realloc(array, (nitems + 2) * sizeof(array[0]));
    if (array == NULL) {
        free(copy);
        return NULL;
    }

    /* Add copy of item to array, and return it. */
    array[nitems] = copy;
    array[nitems+1] = NULL;
    return array;
}

太惡心了 對於真正好的樣式,最好是將輸入的內容復制為自己的功能,從而分割出動態副本的內容,但我將其作為練習內容留給讀者。

最后,我們有實際的拆分功能。 它還需要處理一些特殊情況:

  • 輸入字符串可以以分隔符開頭或結尾。
  • 可能有彼此相鄰的分隔符。
  • 輸入字符串可能根本不包含分隔符。

如果分隔符在輸入字符串的開頭或結尾旁邊,或者在另一個分隔符的旁邊,我選擇在結果中添加一個空字符串。 如果需要其他功能,則需要調整代碼。

除了特殊情況和一些錯誤處理之外,拆分現在非常簡單。

/* Split a string into substrings. Return dynamic array of dynamically
   allocated substrings, or NULL if there was an error. Caller is
   expected to free the memory, for example with str_array_free. */
char **str_split(const char *input, const char *sep)
{
    size_t nitems = 0;
    char **array = NULL;
    const char *start = input;
    char *next = strstr(start, sep);
    size_t seplen = strlen(sep);
    const char *item;
    size_t itemlen;

    for (;;) {
        next = strstr(start, sep);
        if (next == NULL) {
            /* Add the remaining string (or empty string, if input ends with
               separator. */
            char **new = str_array_append(array, nitems, start, strlen(start));
            if (new == NULL) {
                str_array_free(array);
                return NULL;
            }
            array = new;
            ++nitems;
            break;
        } else if (next == input) {
            /* Input starts with separator. */
            item = "";
            itemlen = 0;
        } else {
            item = start;
            itemlen = next - item;
        }
        char **new = str_array_append(array, nitems, item, itemlen);
        if (new == NULL) {
            str_array_free(array);
            return NULL;
        }
        array = new;
        ++nitems;
        start = next + seplen;
    }

    if (nitems == 0) {
        /* Input does not contain separator at all. */
        assert(array == NULL);
        array = str_array_append(array, nitems, input, strlen(input));
    }

    return array;
}

這是整個程序的一部分。 它還包括一個運行某些測試用例的主程序。

#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* Append an item to a dynamically allocated array of strings. On failure,
   return NULL, in which case the original array is intact. The item
   string is dynamically copied. If the array is NULL, allocate a new
   array. Otherwise, extend the array. Make sure the array is always
   NULL-terminated. Input string might not be '\0'-terminated. */
char **str_array_append(char **array, size_t nitems, const char *item, 
                        size_t itemlen)
{
    /* Make a dynamic copy of the item. */
    char *copy;
    if (item == NULL)
        copy = NULL;
    else {
        copy = malloc(itemlen + 1);
        if (copy == NULL)
            return NULL;
        memcpy(copy, item, itemlen);
        copy[itemlen] = '\0';
    }

    /* Extend array with one element. Except extend it by two elements, 
       in case it did not yet exist. This might mean it is a teeny bit
       too big, but we don't care. */
    array = realloc(array, (nitems + 2) * sizeof(array[0]));
    if (array == NULL) {
        free(copy);
        return NULL;
    }

    /* Add copy of item to array, and return it. */
    array[nitems] = copy;
    array[nitems+1] = NULL;
    return array;
}


/* Free a dynamic array of dynamic strings. */
void str_array_free(char **array)
{
    if (array == NULL)
        return;
    for (size_t i = 0; array[i] != NULL; ++i)
        free(array[i]);
    free(array);
}


/* Split a string into substrings. Return dynamic array of dynamically
   allocated substrings, or NULL if there was an error. Caller is
   expected to free the memory, for example with str_array_free. */
char **str_split(const char *input, const char *sep)
{
    size_t nitems = 0;
    char **array = NULL;
    const char *start = input;
    char *next = strstr(start, sep);
    size_t seplen = strlen(sep);
    const char *item;
    size_t itemlen;

    for (;;) {
        next = strstr(start, sep);
        if (next == NULL) {
            /* Add the remaining string (or empty string, if input ends with
               separator. */
            char **new = str_array_append(array, nitems, start, strlen(start));
            if (new == NULL) {
                str_array_free(array);
                return NULL;
            }
            array = new;
            ++nitems;
            break;
        } else if (next == input) {
            /* Input starts with separator. */
            item = "";
            itemlen = 0;
        } else {
            item = start;
            itemlen = next - item;
        }
        char **new = str_array_append(array, nitems, item, itemlen);
        if (new == NULL) {
            str_array_free(array);
            return NULL;
        }
        array = new;
        ++nitems;
        start = next + seplen;
    }

    if (nitems == 0) {
        /* Input does not contain separator at all. */
        assert(array == NULL);
        array = str_array_append(array, nitems, input, strlen(input));
    }

    return array;
}


/* Return length of a NULL-delimited array of strings. */
size_t str_array_len(char **array)
{
    size_t len;

    for (len = 0; array[len] != NULL; ++len)
        continue;
    return len;
}


#define MAX_OUTPUT 20


int main(void)
{
    struct {
        const char *input;
        const char *sep;
        char *output[MAX_OUTPUT];
    } tab[] = {
        /* Input is empty string. Output should be a list with an empty 
           string. */
        {
            "",
            "and",
            {
                "",
                NULL,
            },
        },
        /* Input is exactly the separator. Output should be two empty 
           strings. */
        {
            "and",
            "and",
            {
                "",
                "",
                NULL,
            },
        },
        /* Input is non-empty, but does not have separator. Output should
           be the same string. */
        {
            "foo",
            "and",
            {
                "foo",
                NULL,
            },
        },
        /* Input is non-empty, and does have separator. */
        {
            "foo bar 1 and foo bar 2",
            " and ",
            {
                "foo bar 1",
                "foo bar 2",
                NULL,
            },
        },
    };
    const int tab_len = sizeof(tab) / sizeof(tab[0]);
    bool errors;

    errors = false;

    for (int i = 0; i < tab_len; ++i) {
        printf("test %d\n", i);

        char **output = str_split(tab[i].input, tab[i].sep);
        if (output == NULL) {
            fprintf(stderr, "output is NULL\n");
            errors = true;
            break;
        }
        size_t num_output = str_array_len(output);
        printf("num_output %lu\n", (unsigned long) num_output);

        size_t num_correct = str_array_len(tab[i].output);
        if (num_output != num_correct) {
            fprintf(stderr, "wrong number of outputs (%lu, not %lu)\n",
                    (unsigned long) num_output, (unsigned long) num_correct);
            errors = true;
        } else {
            for (size_t j = 0; j < num_output; ++j) {
                if (strcmp(tab[i].output[j], output[j]) != 0) {
                    fprintf(stderr, "output[%lu] is '%s' not '%s'\n",
                            (unsigned long) j, output[j], tab[i].output[j]);
                    errors = true;
                    break;
                }
            }
        }

        str_array_free(output);
        printf("\n");
    }

    if (errors)
        return EXIT_FAILURE;   
    return 0;
}

這是我剛剛寫的一個很好的簡短示例,展示了如何使用strstr在給定的字符串上拆分字符串:

#include <string.h>
#include <stdio.h>

void split(char *phrase, char *delimiter)
{
    char *loc = strstr(phrase, delimiter);
    if (loc == NULL)
    {
        printf("Could not find delimiter\n");
    }
    else
    {
        char buf[256]; /* malloc would be much more robust here */
        int length = strlen(delimiter);
        strncpy(buf, phrase, loc - phrase);
        printf("Before delimiter: '%s'\n", buf);
        printf("After delimiter: '%s'\n", loc+length);
    }
}

int main()
{
    split("foo bar 1 and foo bar 2", "and");
    printf("-----\n");
    split("foo bar 1 and foo bar 2", "quux");
    return 0;
}

輸出:

Before delimiter: 'foo bar 1 '
After delimiter: ' foo bar 2'
-----
Could not find delimiter

當然,我還沒有對它進行完整的測試,它可能容易受到與字符串長度有關的大多數標准緩沖區溢出問題的影響。 但這至少是一個可以證明的例子。

如果您知道定界符示例逗號或分號的類型,則可以嘗試以下操作:

#include<stdio.h>
#include<conio.h>
int main()
{
  int i=0,temp=0,temp1=0, temp2=0;
  char buff[12]="123;456;789";
   for(i=0;buff[i]!=';',i++)
   {
     temp=temp*10+(buff[i]-48);
   }
   for(i=0;buff[i]!=';',i++)
   {
     temp1=temp1*10+(buff[i]-48);
   }
   for(i=0;buff[i],i++)
   {
     temp2=temp2*10+(buff[i]-48);
   }
    printf("temp=%d temp1=%d temp2=%d",temp,temp1,temp2);
    getch();
  return 0;
}

輸出:

temp=123 temp1=456 temp2=789

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM