[英]Split string with delimiters in C
如何在 C 編程語言中編寫一個函數來拆分並返回帶有分隔符的字符串的數組?
char* str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
str_split(str,',');
您可以使用strtok()
函數來拆分字符串(並指定要使用的分隔符)。 請注意, strtok()
將修改傳遞給它的字符串。 如果在其他地方需要原始字符串,請復制它並將副本傳遞給strtok()
。
編輯:
示例(注意它不處理連續的分隔符,例如“JAN,,,FEB,MAR”):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
char** str_split(char* a_str, const char a_delim)
{
char** result = 0;
size_t count = 0;
char* tmp = a_str;
char* last_comma = 0;
char delim[2];
delim[0] = a_delim;
delim[1] = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char*) * count);
if (result)
{
size_t idx = 0;
char* token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
int main()
{
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char** tokens;
printf("months=[%s]\n\n", months);
tokens = str_split(months, ',');
if (tokens)
{
int i;
for (i = 0; *(tokens + i); i++)
{
printf("month=[%s]\n", *(tokens + i));
free(*(tokens + i));
}
printf("\n");
free(tokens);
}
return 0;
}
輸出:
$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]
month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]
我認為strsep
仍然是最好的工具:
while ((token = strsep(&str, ","))) my_fn(token);
這實際上是分割字符串的一行。
額外的括號是一種風格元素,表明我們有意測試賦值的結果,而不是相等運算符==
。
要使該模式起作用, token
和str
都具有char *
類型。 如果您從字符串文字開始,那么您首先要復制它:
// More general pattern:
const char *my_str_literal = "JAN,FEB,MAR";
char *token, *str, *tofree;
tofree = str = strdup(my_str_literal); // We own str's memory now.
while ((token = strsep(&str, ","))) my_fn(token);
free(tofree);
如果兩個分隔符一起出現在str
中,您將獲得一個空字符串的token
值。 str
的值被修改,因為遇到的每個分隔符都被零字節覆蓋 - 這是復制首先被解析的字符串的另一個好理由。
在評論中,有人建議strtok
比strsep
更好,因為strtok
更便攜。 Ubuntu 和 Mac OS X 有strsep
; 可以肯定的是,其他 unixy 系統也可以這樣做。 Windows 缺少strsep
,但它有strbrk
可以實現這個簡短而甜蜜的strsep
替換:
char *strsep(char **stringp, const char *delim) {
if (*stringp == NULL) { return NULL; }
char *token_start = *stringp;
*stringp = strpbrk(token_start, delim);
if (*stringp) {
**stringp = '\0';
(*stringp)++;
}
return token_start;
}
這是strsep
vs strtok
的一個很好的解釋。 可以主觀判斷優劣; 但是,我認為這是一個明顯的跡象,表明strsep
被設計為strtok
的替代品。
字符串標記器此代碼應該讓您朝着正確的方向前進。
int main(void) {
char st[] ="Where there is will, there is a way.";
char *ch;
ch = strtok(st, " ");
while (ch != NULL) {
printf("%s\n", ch);
ch = strtok(NULL, " ,");
}
getch();
return 0;
}
下面的方法將為您完成所有工作(內存分配,計算長度)。 更多信息和描述可以在這里找到 - Java String.split() 方法來拆分 C 字符串的實現
int split (const char *str, char c, char ***arr)
{
int count = 1;
int token_len = 1;
int i = 0;
char *p;
char *t;
p = str;
while (*p != '\0')
{
if (*p == c)
count++;
p++;
}
*arr = (char**) malloc(sizeof(char*) * count);
if (*arr == NULL)
exit(1);
p = str;
while (*p != '\0')
{
if (*p == c)
{
(*arr)[i] = (char*) malloc( sizeof(char) * token_len );
if ((*arr)[i] == NULL)
exit(1);
token_len = 0;
i++;
}
p++;
token_len++;
}
(*arr)[i] = (char*) malloc( sizeof(char) * token_len );
if ((*arr)[i] == NULL)
exit(1);
i = 0;
p = str;
t = ((*arr)[i]);
while (*p != '\0')
{
if (*p != c && *p != '\0')
{
*t = *p;
t++;
}
else
{
*t = '\0';
i++;
t = ((*arr)[i]);
}
p++;
}
return count;
}
如何使用它:
int main (int argc, char ** argv)
{
int i;
char *s = "Hello, this is a test module for the string splitting.";
int c = 0;
char **arr = NULL;
c = split(s, ' ', &arr);
printf("found %d tokens.\n", c);
for (i = 0; i < c; i++)
printf("string #%d: %s\n", i, arr[i]);
return 0;
}
這是我的兩分錢:
int split (const char *txt, char delim, char ***tokens)
{
int *tklen, *t, count = 1;
char **arr, *p = (char *) txt;
while (*p != '\0') if (*p++ == delim) count += 1;
t = tklen = calloc (count, sizeof (int));
for (p = (char *) txt; *p != '\0'; p++) *p == delim ? *t++ : (*t)++;
*tokens = arr = malloc (count * sizeof (char *));
t = tklen;
p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
while (*txt != '\0')
{
if (*txt == delim)
{
p = *arr++ = calloc (*(t++) + 1, sizeof (char *));
txt++;
}
else *p++ = *txt++;
}
free (tklen);
return count;
}
用法:
char **tokens;
int count, i;
const char *str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
count = split (str, ',', &tokens);
for (i = 0; i < count; i++) printf ("%s\n", tokens[i]);
/* freeing tokens */
for (i = 0; i < count; i++) free (tokens[i]);
free (tokens);
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
/**
* splits str on delim and dynamically allocates an array of pointers.
*
* On error -1 is returned, check errno
* On success size of array is returned, which may be 0 on an empty string
* or 1 if no delim was found.
*
* You could rewrite this to return the char ** array instead and upon NULL
* know it's an allocation problem but I did the triple array here. Note that
* upon the hitting two delim's in a row "foo,,bar" the array would be:
* { "foo", NULL, "bar" }
*
* You need to define the semantics of a trailing delim Like "foo," is that a
* 2 count array or an array of one? I choose the two count with the second entry
* set to NULL since it's valueless.
* Modifies str so make a copy if this is a problem
*/
int split( char * str, char delim, char ***array, int *length ) {
char *p;
char **res;
int count=0;
int k=0;
p = str;
// Count occurance of delim in string
while( (p=strchr(p,delim)) != NULL ) {
*p = 0; // Null terminate the deliminator.
p++; // Skip past our new null
count++;
}
// allocate dynamic array
res = calloc( 1, count * sizeof(char *));
if( !res ) return -1;
p = str;
for( k=0; k<count; k++ ){
if( *p ) res[k] = p; // Copy start of string
p = strchr(p, 0 ); // Look for next null
p++; // Start of next string
}
*array = res;
*length = count;
return 0;
}
char str[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,";
int main() {
char **res;
int k=0;
int count =0;
int rc;
rc = split( str, ',', &res, &count );
if( rc ) {
printf("Error: %s errno: %d \n", strerror(errno), errno);
}
printf("count: %d\n", count );
for( k=0; k<count; k++ ) {
printf("str: %s\n", res[k]);
}
free(res );
return 0;
}
我認為以下解決方案是理想的:
代碼說明:
token
來存儲token的地址和長度str
完全由分隔符組成,因此有strlen(str) + 1
標記,它們都是空字符串str
記錄每個token的地址和長度NULL
標記值的額外空間memcpy
,因為它比strcpy
更快,而且我們知道長度typedef struct {
const char *start;
size_t len;
} token;
char **split(const char *str, char sep)
{
char **array;
unsigned int start = 0, stop, toks = 0, t;
token *tokens = malloc((strlen(str) + 1) * sizeof(token));
for (stop = 0; str[stop]; stop++) {
if (str[stop] == sep) {
tokens[toks].start = str + start;
tokens[toks].len = stop - start;
toks++;
start = stop + 1;
}
}
/* Mop up the last token */
tokens[toks].start = str + start;
tokens[toks].len = stop - start;
toks++;
array = malloc((toks + 1) * sizeof(char*));
for (t = 0; t < toks; t++) {
/* Calloc makes it nul-terminated */
char *token = calloc(tokens[t].len + 1, 1);
memcpy(token, tokens[t].start, tokens[t].len);
array[t] = token;
}
/* Add a sentinel */
array[t] = NULL;
free(tokens);
return array;
}
請注意,為簡潔起見,省略了malloc
檢查。
一般來說,我不會從這樣的拆分函數中返回一個char *
指針數組,因為它讓調用者承擔了很多正確釋放它們的責任。 我更喜歡的接口是允許調用者傳遞一個回調函數並為每個標記調用它,正如我在這里描述的: Split a String in C 。
在上面的示例中,將有一種方法可以在字符串中返回一個以空字符結尾的字符串數組(如您所願)。 但是,它無法傳遞文字字符串,因為它必須由函數修改:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char** str_split( char* str, char delim, int* numSplits )
{
char** ret;
int retLen;
char* c;
if ( ( str == NULL ) ||
( delim == '\0' ) )
{
/* Either of those will cause problems */
ret = NULL;
retLen = -1;
}
else
{
retLen = 0;
c = str;
/* Pre-calculate number of elements */
do
{
if ( *c == delim )
{
retLen++;
}
c++;
} while ( *c != '\0' );
ret = malloc( ( retLen + 1 ) * sizeof( *ret ) );
ret[retLen] = NULL;
c = str;
retLen = 1;
ret[0] = str;
do
{
if ( *c == delim )
{
ret[retLen++] = &c[1];
*c = '\0';
}
c++;
} while ( *c != '\0' );
}
if ( numSplits != NULL )
{
*numSplits = retLen;
}
return ret;
}
int main( int argc, char* argv[] )
{
const char* str = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char* strCpy;
char** split;
int num;
int i;
strCpy = malloc( strlen( str ) * sizeof( *strCpy ) );
strcpy( strCpy, str );
split = str_split( strCpy, ',', &num );
if ( split == NULL )
{
puts( "str_split returned NULL" );
}
else
{
printf( "%i Results: \n", num );
for ( i = 0; i < num; i++ )
{
puts( split[i] );
}
}
free( split );
free( strCpy );
return 0;
}
可能有一種更簡潔的方法來做到這一點,但你明白了。
此函數接受一個 char* 字符串並通過分隔符將其拆分。 一行中可以有多個分隔符。 請注意,該函數會修改原始字符串。 如果您需要原始字符串保持不變,則必須先復制原始字符串。 此函數不使用任何 cstring 函數調用,因此它可能比其他函數快一點。 如果您不關心內存分配,您可以在函數頂部分配大小為 strlen(src_str)/2 的子字符串,並且(如提到的 c++“版本”)跳過函數的下半部分。 如果這樣做,函數會減少到 O(N),但下面顯示的內存優化方式是 O(2N)。
功能:
char** str_split(char *src_str, const char deliminator, size_t &num_sub_str){
//replace deliminator's with zeros and count how many
//sub strings with length >= 1 exist
num_sub_str = 0;
char *src_str_tmp = src_str;
bool found_delim = true;
while(*src_str_tmp){
if(*src_str_tmp == deliminator){
*src_str_tmp = 0;
found_delim = true;
}
else if(found_delim){ //found first character of a new string
num_sub_str++;
found_delim = false;
//sub_str_vec.push_back(src_str_tmp); //for c++
}
src_str_tmp++;
}
printf("Start - found %d sub strings\n", num_sub_str);
if(num_sub_str <= 0){
printf("str_split() - no substrings were found\n");
return(0);
}
//if you want to use a c++ vector and push onto it, the rest of this function
//can be omitted (obviously modifying input parameters to take a vector, etc)
char **sub_strings = (char **)malloc( (sizeof(char*) * num_sub_str) + 1);
const char *src_str_terminator = src_str_tmp;
src_str_tmp = src_str;
bool found_null = true;
size_t idx = 0;
while(src_str_tmp < src_str_terminator){
if(!*src_str_tmp) //found a NULL
found_null = true;
else if(found_null){
sub_strings[idx++] = src_str_tmp;
//printf("sub_string_%d: [%s]\n", idx-1, sub_strings[idx-1]);
found_null = false;
}
src_str_tmp++;
}
sub_strings[num_sub_str] = NULL;
return(sub_strings);
}
如何使用它:
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char *str = strdup(months);
size_t num_sub_str;
char **sub_strings = str_split(str, ',', num_sub_str);
char *endptr;
if(sub_strings){
for(int i = 0; sub_strings[i]; i++)
printf("[%s]\n", sub_strings[i]);
}
free(sub_strings);
free(str);
這是一個字符串拆分函數,可以處理多字符分隔符。 請注意,如果分隔符比要拆分的字符串長,則buffer
和stringLengths
將設置為(void *) 0
,而numStrings
將設置為0
。
該算法已經過測試,並且有效。 (免責聲明:未針對非 ASCII 字符串進行測試,假設調用者提供了有效參數)
void splitString(const char *original, const char *delimiter, char ** * buffer, int * numStrings, int * * stringLengths){
const int lo = strlen(original);
const int ld = strlen(delimiter);
if(ld > lo){
*buffer = (void *)0;
*numStrings = 0;
*stringLengths = (void *)0;
return;
}
*numStrings = 1;
for(int i = 0;i < (lo - ld);i++){
if(strncmp(&original[i], delimiter, ld) == 0) {
i += (ld - 1);
(*numStrings)++;
}
}
*stringLengths = (int *) malloc(sizeof(int) * *numStrings);
int currentStringLength = 0;
int currentStringNumber = 0;
int delimiterTokenDecrementCounter = 0;
for(int i = 0;i < lo;i++){
if(delimiterTokenDecrementCounter > 0){
delimiterTokenDecrementCounter--;
} else if(i < (lo - ld)){
if(strncmp(&original[i], delimiter, ld) == 0){
(*stringLengths)[currentStringNumber] = currentStringLength;
currentStringNumber++;
currentStringLength = 0;
delimiterTokenDecrementCounter = ld - 1;
} else {
currentStringLength++;
}
} else {
currentStringLength++;
}
if(i == (lo - 1)){
(*stringLengths)[currentStringNumber] = currentStringLength;
}
}
*buffer = (char **) malloc(sizeof(char *) * (*numStrings));
for(int i = 0;i < *numStrings;i++){
(*buffer)[i] = (char *) malloc(sizeof(char) * ((*stringLengths)[i] + 1));
}
currentStringNumber = 0;
currentStringLength = 0;
delimiterTokenDecrementCounter = 0;
for(int i = 0;i < lo;i++){
if(delimiterTokenDecrementCounter > 0){
delimiterTokenDecrementCounter--;
} else if(currentStringLength >= (*stringLengths)[currentStringNumber]){
(*buffer)[currentStringNumber][currentStringLength] = 0;
delimiterTokenDecrementCounter = ld - 1;
currentStringLength = 0;
currentStringNumber++;
} else {
(*buffer)[currentStringNumber][currentStringLength] = (char)original[i];
currentStringLength++;
}
}
buffer[currentStringNumber][currentStringLength] = 0;
}
示例代碼:
int main(){
const char *string = "STRING-1 DELIM string-2 DELIM sTrInG-3";
char **buffer;
int numStrings;
int * stringLengths;
splitString(string, " DELIM ", &buffer, &numStrings, &stringLengths);
for(int i = 0;i < numStrings;i++){
printf("String: %s\n", buffer[i]);
}
}
圖書館:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
此優化方法在 *result 中創建(或更新現有)指針數組並返回 *count 中的元素數。
使用“max”表示您期望的最大字符串數(當您指定現有數組或任何其他原因時),否則將其設置為 0
要與分隔符列表進行比較,請將 delim 定義為 char* 並替換以下行:
if (str[i]==delim) {
使用以下兩行:
char *c=delim; while(*c && *c!=str[i]) c++;
if (*c) {
享受
#include <stdlib.h>
#include <string.h>
char **split(char *str, size_t len, char delim, char ***result, unsigned long *count, unsigned long max) {
size_t i;
char **_result;
// there is at least one string returned
*count=1;
_result= *result;
// when the result array is specified, fill it during the first pass
if (_result) {
_result[0]=str;
}
// scan the string for delimiter, up to specified length
for (i=0; i<len; ++i) {
// to compare against a list of delimiters,
// define delim as a string and replace
// the next line:
// if (str[i]==delim) {
//
// with the two following lines:
// char *c=delim; while(*c && *c!=str[i]) c++;
// if (*c) {
//
if (str[i]==delim) {
// replace delimiter with zero
str[i]=0;
// when result array is specified, fill it during the first pass
if (_result) {
_result[*count]=str+i+1;
}
// increment count for each separator found
++(*count);
// if max is specified, dont go further
if (max && *count==max) {
break;
}
}
}
// when result array is specified, we are done here
if (_result) {
return _result;
}
// else allocate memory for result
// and fill the result array
*result=malloc((*count)*sizeof(char*));
if (!*result) {
return NULL;
}
_result=*result;
// add first string to result
_result[0]=str;
// if theres more strings
for (i=1; i<*count; ++i) {
// find next string
while(*str) ++str;
++str;
// add next string to result
_result[i]=str;
}
return _result;
}
使用示例:
#include <stdio.h>
int main(int argc, char **argv) {
char *str="JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char **result=malloc(6*sizeof(char*));
char **result2=0;
unsigned long count;
unsigned long count2;
unsigned long i;
split(strdup(str),strlen(str),',',&result,&count,6);
split(strdup(str),strlen(str),',',&result2,&count2,0);
if (result)
for (i=0; i<count; ++i) {
printf("%s\n",result[i]);
}
printf("\n");
if (result2)
for (i=0; i<count2; ++i) {
printf("%s\n", result2[i]);
}
return 0;
}
下面是我的zString library中的strtok()
實現。 zstring_strtok()
與標准庫的strtok()
處理連續分隔符的方式不同。
看看下面的代碼,確保你會了解它是如何工作的(我嘗試使用盡可能多的注釋)
char *zstring_strtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
下面是一個示例用法...
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zstring_strtok(s,","));
printf("2 %s\n",zstring_strtok(NULL,","));
printf("3 %s\n",zstring_strtok(NULL,","));
printf("4 %s\n",zstring_strtok(NULL,","));
printf("5 %s\n",zstring_strtok(NULL,","));
printf("6 %s\n",zstring_strtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
該庫可以從 Github https://github.com/fnoyanisi/zString下載
我的版本:
int split(char* str, const char delimeter, char*** args) {
int cnt = 1;
char* t = str;
while (*t == delimeter) t++;
char* t2 = t;
while (*(t2++))
if (*t2 == delimeter && *(t2 + 1) != delimeter && *(t2 + 1) != 0) cnt++;
(*args) = malloc(sizeof(char*) * cnt);
for(int i = 0; i < cnt; i++) {
char* ts = t;
while (*t != delimeter && *t != 0) t++;
int len = (t - ts + 1);
(*args)[i] = malloc(sizeof(char) * len);
memcpy((*args)[i], ts, sizeof(char) * (len - 1));
(*args)[i][len - 1] = 0;
while (*t == delimeter) t++;
}
return cnt;
}
試試用這個。
char** strsplit(char* str, const char* delim){
char** res = NULL;
char* part;
int i = 0;
char* aux = strdup(str);
part = strdup(strtok(aux, delim));
while(part){
res = (char**)realloc(res, (i + 1) * sizeof(char*));
*(res + i) = strdup(part);
part = strdup(strtok(NULL, delim));
i++;
}
res = (char**)realloc(res, i * sizeof(char*));
*(res + i) = NULL;
return res;
}
Explode & implode - 初始字符串保持不變,動態內存分配
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
typedef struct
{
uintptr_t ptr;
int size;
} token_t;
int explode(char *str, int slen, const char *delimiter, token_t **tokens)
{
int i = 0, c1 = 0, c2 = 0;
for(i = 0; i <= slen; i++)
{
if(str[i] == *delimiter)
{
c1++;
}
}
if(c1 == 0)
{
return -1;
}
*tokens = (token_t*)calloc((c1 + 1), sizeof(token_t));
((*tokens)[c2]).ptr = (uintptr_t)str;
i = 0;
while(i <= slen)
{
if((str[i] == *delimiter) || (i == slen))
{
((*tokens)[c2]).size = (int)((uintptr_t)&(str[i]) - (uintptr_t)(((*tokens)[c2]).ptr));
if(i < slen)
{
c2++;
((*tokens)[c2]).ptr = (uintptr_t)&(str[i + 1]);
}
}
i++;
}
return (c1 + 1);
}
char* implode(token_t *tokens, int size, const char *delimiter)
{
int i, len = 0;
char *str;
for(i = 0; i < len; i++)
{
len += tokens[i].size + 1;
}
str = (char*)calloc(len, sizeof(char));
len = 0;
for(i = 0; i < size; i++)
{
memcpy((void*)&str[len], (void*)tokens[i].ptr, tokens[i].size);
len += tokens[i].size;
str[(len++)] = *delimiter;
}
str[len - 1] = '\0';
return str;
}
用法:
int main(int argc, char **argv)
{
int i, c;
char *exp = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
token_t *tokens;
char *imp;
printf("%s\n", exp);
if((c = explode(exp, strlen(exp), ",", &tokens)) > 0)
{
imp = implode(tokens, c, ",");
printf("%s\n", imp);
for(i = 0; i < c; i++)
{
printf("%.*s, %d\n", tokens[i].size, (char*)tokens[i].ptr, tokens[i].size);
}
}
free((void*)tokens);
free((void*)imp);
return 0;
}
如果你願意使用外部庫,我推薦bstrlib
是不夠的。 它需要一些額外的設置,但從長遠來看更容易使用。
例如,拆分下面的字符串,首先使用bfromcstr()
調用創建一個bstring
。 ( bstring
是 char 緩沖區的包裝器)。 接下來,用逗號分割字符串,將結果保存在struct bstrList
中,其中包含字段qty
和數組entry
,它是bstring
的數組。
bstrlib
有許多其他函數可以對bstring
進行操作
非常簡單...
#include "bstrlib.h"
#include <stdio.h>
int main() {
int i;
char *tmp = "Hello,World,sak";
bstring bstr = bfromcstr(tmp);
struct bstrList *blist = bsplit(bstr, ',');
printf("num %d\n", blist->qty);
for(i=0;i<blist->qty;i++) {
printf("%d: %s\n", i, bstr2cstr(blist->entry[i], '_'));
}
}
我知道的派對遲到了,但這里還有 2 個功能可以使用,並且可能會進一步調整以滿足您的需求(帖子底部的源代碼)
另請參閱下面的實施說明,以確定哪個功能更適合您的需求。
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h> // C99
// tokenize destructively
char **str_toksarray_alloc(
char **strp, /* InOut: pointer to the source non-constant c-string */
const char *delim, /* c-string containing the delimiting chars */
size_t *ntoks, /* InOut: # of tokens to parse/parsed (NULL or *ntoks==0 for all tokens) */
bool keepnulls /* false ignores empty tokens, true includes them */
);
// tokenize non-destructively
char **str_toksarray_alloc2(
const char *str, /* the source c-string */
const char *delim,
size_t *ntoks,
bool keepnulls
);
它們的原型幾乎相同,除了源字符串(分別為strp
和str
)。
strp
(指向字符串的指針)是已分配的非常量 c 字符串的地址,要就地標記化。 str
是一個未更改的 c 字符串(它甚至可以是字符串文字)。 c-string我的意思是一個以nul
結尾的字符緩沖區。 兩個函數的其余參數相同。
要解析所有可用的令牌, ntoks
靜音(意味着在將其傳遞給任何函數之前將其設置為 0 或將其作為NULL
指針傳遞)。 否則,函數會解析到*ntoks
標記,或者直到沒有更多標記(以先到者為准)。 在任何情況下,當ntoks
non-NULL
時,它會更新成功解析令牌的計數。
另請注意,非靜音ntoks
確定將分配多少指針。 因此,如果源字符串包含 10 個標記並且我們將ntoks
設置為 1000,我們最終將得到 990 個不必要的分配指針。 另一方面,如果源字符串包含 1000 個標記,但我們只需要前 10 個, ntoks
設置為 10 聽起來是一個更明智的選擇。
這兩個函數都分配並返回一個 char-pointers 數組,但是str_toksarray_alloc()
使它們指向修改后的源字符串本身中的標記,而str_toksarray_alloc2()
使它們指向動態分配的標記副本(最后是 2其名稱表示 2 級分配)。
返回的數組附加了一個NULL
哨兵指針,在ntoks
的回傳值中不考慮該指針(否則,當non-NULL
時, ntoks
將返回數組的長度而不是其1 級大小)。
當keepnulls
設置為true
時,生成的標記類似於我們對strsep()函數的期望。 主要意味着源字符串中的連續定界符產生空標記(null),如果delim
是一個空的 c 字符串或在源字符串中沒有找到它包含的定界符,則結果只有 1 個標記:源細繩。 與strsep()相反,可以通過將keepnulls
設置為false
來忽略空標記。
可以通過檢查函數的返回值與NULL
或通過檢查ntoks
的回傳值與 0 (假設ntoks
non-NULL
)來識別函數的失敗調用。 我建議在嘗試訪問返回的數組之前始終檢查失敗,因為這些函數包括健全性檢查,可以推遲否則立即崩潰(例如,將NULL
指針作為源字符串傳遞)。
成功后,調用者應該在完成后釋放數組。 對於str_toksarray_alloc()
,一個簡單的free()就足夠了。 對於str_toksarray_alloc2()
,由於第二級分配,涉及一個循環。 NULL
哨兵( non-NULL
ntoks
的回傳值)使這變得微不足道,但我還在下面為所有懶惰的蜜蜂提供了一個toksarray_free2()
函數:)
下面是使用這兩個函數的簡化示例。
准備:
const char *src = ";b,test,Tèst,;;cd;ελληνικά,nørmälize,;string to";
const char *delim = ";,";
bool keepnulls = true;
size_t ntoks = 0;
str_toksarray_alloc():
// destructive (use copy of src)
char *scopy = strdup( src );
if (!scopy) { ... }; // handle strdup failure
printf( "%s\n", src );
char **arrtoks = str_toksarray_alloc( &scopy, delim, &ntoks, keepnulls );
printf( "%lu tokens read\n", ntoks );
if ( arrtoks ) {
for (int i=0; arrtoks[i]; i++) {
printf( "%d: %s\n", i, arrtoks[i] );
}
}
free( scopy );
free( arrtoks );
/* OUTPUT
;b,test,Tèst,;;cd;ελληνικά,nørmälize,;string to
11 tokens read
0:
1: b
2: test
3: Tèst
4:
5:
6: cd
7: ελληνικά
8: nørmälize
9:
10: string to
*/
str_toksarray_alloc2():
// non-destructive
keepnulls = false; // reject empty tokens
printf( "%s\n", src );
arrtoks = str_toksarray_alloc2( src, delim, &ntoks, keepnulls );
printf( "%lu tokens read\n", ntoks );
if ( arrtoks ) {
for (int i=0; arrtoks[i]; i++) {
printf( "%d: %s\n", i, arrtoks[i] );
}
}
toksarray_free2( arrtoks ); // dangling arrtoks
// or: arrtoks = toksarray_free2( arrtoks ); // non-dangling artoks
/* OUTPUT
;b,test,Tèst,;;cd;ελληνικά,nørmälize,;string to
7 tokens read
0: b
1: test
2: Tèst
3: cd
4: ελληνικά
5: nørmälize
6: string to
*/
這兩個函數都使用strsep()進行標記化,這使它們成為線程安全的,但它不是標准函數。 如果未提供,您始終可以使用開源實現(例如GNU或Apple 的)。 str_toksarray_alloc2()
中使用的函數strdup()也是如此(它的實現很簡單,但這里還是GNU和Apple 的例子)。
在 str_toksarray_alloc() 中使用strsep()的str_toksarray_alloc()
是源字符串的起始指針在解析循環的每一步中不斷移動到下一個標記。 這意味着調用者將無法釋放已解析的字符串,除非他們已將起始地址保存到額外的指針。 我們通過使用strpSaved
指針在函數中本地執行此操作,為他們省去了麻煩。 str_toksarray_alloc2()
不受此影響,因為它不接觸源字符串。
這兩個函數之間的主要區別是str_toksarray_alloc()
不為找到的令牌分配內存。 它只是為數組指針分配空間,並將它們設置為直接指向源字符串。 這是有效的,因為strsep() nul
-就地終止找到的標記。 這種依賴性會使您的支持代碼復雜化,但對於大字符串,它也會對性能產生很大影響。 如果保留源字符串並不重要,那么它也會對內存占用產生很大影響。
另一方面, str_toksarray_alloc2()
分配並返回一個由動態分配的令牌副本組成的自我維持數組,沒有進一步的依賴關系。 它首先通過從源字符串的本地副本創建數組,然后將實際令牌內容復制到數組中來實現。 與str_toksarray_alloc()
相比,這要慢得多並且留下更大的內存占用,但它沒有進一步的依賴關系,並且對源字符串的性質沒有特殊要求。 這使得編寫更簡單(因此更易於維護)的支持代碼變得更加容易。
這兩個函數之間的另一個區別是當ntoks
被靜音時的第一級分配(數組指針)。 它們都解析所有可用的令牌,但它們采用完全不同的方法。 str_toksarray_alloc()
使用初始大小為 16(字符指針)的 alloc-ahead,在解析循環中按需加倍。 str_toksarray_alloc2()
進行第一遍計算所有可用令牌,然后它只分配一次那么多字符指針。 第一次通過使用標准函數strpbrk()和strchr()的輔助函數str_toksfound()
完成。 我也在下面提供該函數的源代碼。
哪種方法更好由您決定,具體取決於您的項目需求。 隨意將每個函數的代碼調整為任一方法並從那里獲取。
我想說的是,平均而言,對於非常大的字符串,alloc-ahead 的速度要快得多,尤其是當初始大小和增長因子根據每個案例進行微調時(例如,使它們成為函數參數)。 用所有那些strchr()
和strpbrk()
保存額外的通行證可以在那里有所作為。 然而,對於相對較小的字符串,這幾乎是常態,提前分配一堆字符指針只是一種矯枉過正。 這並沒有什么壞處,但在這種情況下它確實會無緣無故地弄亂代碼。 無論如何,請隨意選擇最適合您的。
這兩個功能也是如此。 我想說在大多數情況下str_toksarray_alloc2()
處理起來要簡單得多,因為內存和性能很少是中小型字符串的問題。 如果您必須處理巨大的字符串,請考慮使用str_toksarray_alloc()
(盡管在這些情況下,您應該使用專門的字符串解析函數,接近您的項目需求和輸入規范)。
哦,男孩,我認為這不僅僅是 2 美分(笑)。
無論如何,這里是 2 個函數和輔助函數的代碼(我已經刪除了他們的大部分描述注釋,因為我已經涵蓋了幾乎所有內容)。
str_toksarray_alloc():
// ----------------------------------------
// Tokenize destructively a nul-terminated source-string.
// Return a dynamically allocated, NULL terminated array of char-pointers
// each pointing to each token found in the source-string, or NULL on error.
//
char **str_toksarray_alloc(char **strp, const char *delim, size_t *ntoks, bool keepnulls)
{
// sanity checks
if ( !strp || !*strp || !**strp || !delim ) {
goto failed;
}
char *strpSaved = *strp; // save initial *strp pointer
bool ntoksOk = (ntoks && *ntoks); // false when ntoks is muted
size_t _ntoks = (ntoksOk ? *ntoks : 16); // # of tokens to alloc-ahead
// alloc array of char-pointers (+1 for NULL sentinel)
char **toksarr = malloc( (_ntoks+1) * sizeof(*toksarr) );
if ( !toksarr ) {
goto failed;
}
// Parse *strp tokens into the array
size_t i = 0; // # of actually parsed tokens
char *tok;
while ( (tok = strsep(strp, delim)) ) {
// if requested, ignore empty tokens
if ( *tok == '\0' && !keepnulls ) {
continue;
}
// non-muted ntoks reached? we are done
if ( ntoksOk && i == _ntoks ) {
*ntoks = i;
break;
}
// muted ntoks & ran out of space? double toksarr and keep parsing
if ( !ntoksOk && i == _ntoks ) {
_ntoks *= 2;
char **tmparr = realloc( toksarr, (_ntoks+1) * sizeof(*tmparr) );
if ( !tmparr ) {
*strp = strpSaved;
free( toksarr );
goto failed;
}
toksarr = tmparr;
}
toksarr[i++] = tok; // get token address
}
toksarr[i] = NULL; // NULL sentinel
*strp = strpSaved; // restore initial *strp pointer
if (ntoks) *ntoks = i; // pass to caller # of parsed tokens
return toksarr;
failed:
if (ntoks) *ntoks = 0;
return NULL;
}
str_toksarray_alloc2():
// ----------------------------------------
// Tokenize non-destructively a nul-terminated source-string.
// Return a dynamically allocated, NULL terminated array of dynamically
// allocated and nul-terminated string copies of each token found in the
// source-string. Return NULL on error.
// The 2 at the end of the name means 2-levels of allocation.
//
char **str_toksarray_alloc2( const char *str, const char *delim, size_t *ntoks, bool keepnulls )
{
// sanity checks
if ( !str || !*str || !delim ) {
if (ntoks) *ntoks = 0;
return NULL;
}
// make a copy of str to work with
char *_str = strdup( str );
if ( !_str ) {
if (ntoks) *ntoks = 0;
return NULL;
}
// if ntoks is muted we'll allocate str_tokscount() tokens, else *ntoks
size_t _ntoks = (ntoks && *ntoks) ? *ntoks : str_tokscount(_str, delim, keepnulls);
if ( _ntoks == 0 ) { // str_tokscount() failed
goto fail_free_str;
}
// alloc the array of strings (+1 for an extra NULL sentinel)
char **toksarr = malloc( (_ntoks+1) * sizeof(*toksarr) );
if ( !toksarr ) {
goto fail_free_str;
}
// Parse str tokens and duplicate them into the array
size_t i = 0; // # of actually parsed tokens
char *tok;
while ( i < _ntoks && (tok = strsep(&_str, delim)) ) {
// if requested, skip empty tokens
if ( *tok == '\0' && !keepnulls ) {
continue;
}
// duplicate current token into the array
char *tmptok = strdup( tok );
if ( !tmptok ) {
goto fail_free_arr;
}
toksarr[i++] = tmptok;
}
toksarr[i] = NULL; // NULL sentinel
free( _str ); // release the local copy of the source-string
if (ntoks) *ntoks = i; // pass to caller the # of parsed tokens
return toksarr;
// cleanup before failing
fail_free_arr:
for (size_t idx=0; idx < i; idx++) {
free( toksarr[idx] );
}
free( toksarr );
fail_free_str:
free( _str );
if (ntoks) *ntoks = 0;
return NULL;
}
str_tokscount() - 輔助函數,由str_toksarr_alloc2()使用:
// ----------------------------------------
// Return the count of tokens present in a nul-terminated source-string (str),
// based on the delimiting chars contained in a 2nd nul-terminated string (delim).
// If the boolean argument is false, empty tokens are excluded.
//
// To stay consistent with the behavior of strsep(), the function returns 1 if
// delim is an empty string or none of its delimiters is found in str (in those
// cases the source-string is considered a single token).
// 0 is returned when str or delim are passed as NULL pointers, or when str is
// passed as an empty string.
//
size_t str_tokscount( const char *str, const char *delim, bool keepnulls )
{
// sanity checks
if ( !str || !*str || !delim ) {
return 0;
}
const char *tok = str;
size_t nnulls = strchr(delim, *str) ? 1 : 0;
size_t ntoks = 1; // even when no delims in str, str counts as 1 token
for (; (str = strpbrk(tok, delim)); ntoks++ ) {
tok = ++str;
if ( strchr(delim, *str) ) {
nnulls++;
}
}
return keepnulls ? ntoks : (ntoks - nnulls);
}
toksarray_free2() - 在str_toksarr_alloc2()返回的數組上使用它:
// ----------------------------------------
// Free a dynamically allocated, NULL terminated, array of char-pointers
// with each such pointer pointing to its own dynamically allocated data.
// Return NULL, so the caller has the choice of assigning it back to the
// dangling pointer. The 2 at the end of the name means 2-levels of deallocation.
//
// NULL terminated array means ending with a NULL sentinel.
// e.g.: toksarr[0] = tok1, ..., toksarr[len] = NULL
//
char **toksarray_free2( char **toksarr )
{
if ( toksarr ) {
char **toks = toksarr;
while ( *toks ) { // walk until NULL sentinel
free( *toks++ );
}
free( toksarr );
}
return NULL;
}
strtok()
和strsep()
都修改輸入字符串。 我們可以使用strspn()和strpbrk()編寫一個函數來根據分隔符拆分字符串。
算法:
null
。strspn()
),稱之為start
。strpbrk()
查找下一個分隔符位置(或字符串結尾,如果不存在更多分隔符),將其稱為end
。start
end
字符串。優勢:
strtok()
和strsep()
那樣修改輸入字符串。執行:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/*
* alloc_str function allocates memory and copy substring
* to allocated memory.
*/
static char * alloc_str (const char * start, const char * end) {
if (!start || !end || (start >= end)) {
return NULL;
}
char * tmp = malloc (end - start + 1);
if (tmp) {
memcpy (tmp, start, end - start);
tmp[end - start] = '\0';
} else {
fprintf (stderr, "Failed to allocate memory\n");
exit (EXIT_FAILURE);
}
return tmp;
}
/*
* str_split function returns the next token which is sequences of contiguous
* characters separated by any of the characters that are part of delimiters.
*
* Parameters:
* p_str : Address of pointer to the string that you want to split.
* sep : A set of characters that delimit the pieces in the string.
*
* Behaviour is undefined if sep is not a pointer to a null-terminated string.
*
* Return :
* Returns the pointer to dynamically allocated memory where the token is copied.
* If p_str is NULL or empty string, NULL is returned.
*/
char * str_split (char ** p_str, const char * sep) {
char * token = NULL;
if (*p_str && **p_str) {
char * p_end;
// skip separator
*p_str += strspn(*p_str, sep);
p_end = *p_str;
// find separator
p_end = strpbrk (p_end, sep);
// strpbrk() returns null pointer if no such character
// exists in the input string which is part of sep argument.
if (!p_end) {
p_end = *p_str + strlen (*p_str);
}
token = alloc_str (*p_str, p_end);
*p_str = p_end;
}
return token;
}
/*==================================================*/
/*==================================================*/
/*
* Just a helper function
*/
void token_helper (char * in_str, const char * delim) {
printf ("\nInput string : ");
if (in_str) printf ("\"%s\"\n", in_str);
else printf ("NULL\n");
if (delim) printf ("Delimiter : \"%s\"\n", delim);
char * ptr = in_str;
char * token = NULL;
printf ("Tokens:\n");
while ((token = str_split(&ptr, delim)) != NULL) {
printf ("-> %s\n", token);
/* You can assign this token to a pointer of an array of pointers
* and return that array of pointers from this function.
* Since, this is for demonstration purpose, I am
* freeing the allocated memory now.
*/
free (token);
}
}
/*
* Driver function
*/
int main (void) {
/* test cases */
char string[100] = "hello world!";
const char * delim = " ";
token_helper (string, delim);
strcpy (string, " hello world,friend of mine!");
delim = " ,";
token_helper (string, delim);
strcpy (string, "Another string");
delim = "-!";
token_helper (string, delim);
strcpy (string, " one more -- string !");
delim = "- !";
token_helper (string, delim);
strcpy (string, "");
delim = " ";
token_helper (string, delim);
token_helper (NULL, "");
strcpy (string, "hi");
delim = " -$";
token_helper (string, delim);
strcpy (string, "Give papa a cup of proper coffee in a copper coffee cup.");
delim = "cp";
token_helper (string, delim);
strcpy (string, "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC");
delim = ",";
token_helper (string, delim);
return 0;
}
輸出:
# ./a.out
Input string : "hello world!"
Delimiter : " "
Tokens:
-> hello
-> world!
Input string : " hello world,friend of mine!"
Delimiter : " ,"
Tokens:
-> hello
-> world
-> friend
-> of
-> mine!
Input string : "Another string"
Delimiter : "-!"
Tokens:
-> Another string
Input string : " one more -- string !"
Delimiter : "- !"
Tokens:
-> one
-> more
-> string
Input string : ""
Delimiter : " "
Tokens:
Input string : NULL
Delimiter : ""
Tokens:
Input string : "hi"
Delimiter : " -$"
Tokens:
-> hi
Input string : "Give papa a cup of proper coffee in a copper coffee cup."
Delimiter : "cp"
Tokens:
-> Give
-> a
-> a a
-> u
-> of
-> ro
-> er
-> offee in a
-> o
-> er
-> offee
-> u
-> .
Input string : "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Delimiter : ","
Tokens:
-> JAN
-> FEB
-> MAR
-> APR
-> MAY
-> JUN
-> JUL
-> AUG
-> SEP
-> OCT
-> NOV
-> DEC
我的方法是掃描字符串並讓指針指向分隔符(和第一個字符)之后的每個字符,同時將字符串中分隔符的外觀分配給'\0'。
首先制作原始字符串的副本(因為它是常量),然后通過掃描獲取拆分數,將其傳遞給指針參數len 。 之后,將第一個結果指針指向復制字符串指針,然后掃描復制字符串:一旦遇到分隔符,將其分配給 '\0' 從而終止前一個結果字符串,並將下一個結果字符串指針指向下一個字符指針。
char** split(char* a_str, const char a_delim, int* len){
char* s = (char*)malloc(sizeof(char) * strlen(a_str));
strcpy(s, a_str);
char* tmp = a_str;
int count = 0;
while (*tmp != '\0'){
if (*tmp == a_delim) count += 1;
tmp += 1;
}
*len = count;
char** results = (char**)malloc(count * sizeof(char*));
results[0] = s;
int i = 1;
while (*s!='\0'){
if (*s == a_delim){
*s = '\0';
s += 1;
results[i++] = s;
}
else s += 1;
}
return results;
}
我的代碼(經過測試):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int dtmsplit(char *str, const char *delim, char ***array, int *length ) {
int i=0;
char *token;
char **res = (char **) malloc(0 * sizeof(char *));
/* get the first token */
token = strtok(str, delim);
while( token != NULL )
{
res = (char **) realloc(res, (i + 1) * sizeof(char *));
res[i] = token;
i++;
token = strtok(NULL, delim);
}
*array = res;
*length = i;
return 1;
}
int main()
{
int i;
int c = 0;
char **arr = NULL;
int count =0;
char str[80] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
c = dtmsplit(str, ",", &arr, &count);
printf("Found %d tokens.\n", count);
for (i = 0; i < count; i++)
printf("string #%d: %s\n", i, arr[i]);
return(0);
}
結果:
Found 12 tokens.
string #0: JAN
string #1: FEB
string #2: MAR
string #3: APR
string #4: MAY
string #5: JUN
string #6: JUL
string #7: AUG
string #8: SEP
string #9: OCT
string #10: NOV
string #11: DEC
嘗試使用strtok函數:
這里的問題是您必須立即處理words
。 如果要將其存儲在數組中,則必須為其分配correct size
,否則它是未知的。
因此,例如:
char **Split(char *in_text, char *in_sep)
{
char **ret = NULL;
int count = 0;
char *tmp = strdup(in_text);
char *pos = tmp;
// This is the pass ONE: we count
while ((pos = strtok(pos, in_sep)) != NULL)
{
count++;
pos = NULL;
}
// NOTE: the function strtok changes the content of the string! So we free and duplicate it again!
free(tmp);
pos = tmp = strdup(in_text);
// We create a NULL terminated array hence the +1
ret = calloc(count+1, sizeof(char*));
// TODO: You have to test the `ret` for NULL here
// This is the pass TWO: we store
count = 0;
while ((pos = strtok(pos, in_sep)) != NULL)
{
ret[count] = strdup(pos);
count++;
pos = NULL;
}
free(tmp);
return count;
}
// Use this to free
void Free_Array(char** in_array)
{
char *pos = in_array;
while (pos[0] != NULL)
{
free(pos[0]);
pos++;
}
free(in_array);
}
注意 :為了避免分配問題,我們使用相同的循環和函數來計算計數(通過一遍)並制作副本(通過第二遍)。
注意2 :您可以在單獨的帖子中使用strtok的其他實現原因。
您可以這樣使用:
int main(void)
{
char **array = Split("Hello World!", " ");
// Now you have the array
// ...
// Then free the memory
Free_Array(array);
array = NULL;
return 0;
}
(我沒有對其進行測試,所以如果不起作用請通知我!)
圍繞這個問題的兩個問題是內存管理和線程安全。 正如您從眾多帖子中看到的那樣,在 C 中無縫完成這不是一項容易的任務。我想要一個解決方案:
我提出的解決方案符合所有這些標准。 與此處發布的其他一些解決方案相比,設置可能需要更多的工作,但我認為在實踐中,為了避免其他解決方案的常見陷阱,額外的工作是值得的。
#include <stdio.h>
#include <string.h>
struct splitFieldType {
char *field;
int maxLength;
};
typedef struct splitFieldType splitField;
int strsplit(splitField *fields, int expected, const char *input, const char *fieldSeparator, void (*softError)(int fieldNumber,int expected,int actual)) {
int i;
int fieldSeparatorLen=strlen(fieldSeparator);
const char *tNext, *tLast=input;
for (i=0; i<expected && (tNext=strstr(tLast, fieldSeparator))!=NULL; ++i) {
int len=tNext-tLast;
if (len>=fields[i].maxLength) {
softError(i,fields[i].maxLength-1,len);
len=fields[i].maxLength-1;
}
fields[i].field[len]=0;
strncpy(fields[i].field,tLast,len);
tLast=tNext+fieldSeparatorLen;
}
if (i<expected) {
if (strlen(tLast)>fields[i].maxLength) {
softError(i,fields[i].maxLength,strlen(tLast));
} else {
strcpy(fields[i].field,tLast);
}
return i+1;
} else {
return i;
}
}
void monthSplitSoftError(int fieldNumber, int expected, int actual) {
fprintf(stderr,"monthSplit: input field #%d is %d bytes, expected %d bytes\n",fieldNumber+1,actual,expected);
}
int main() {
const char *fieldSeparator=",";
const char *input="JAN,FEB,MAR,APRI,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,FOO,BAR";
struct monthFieldsType {
char field1[4];
char field2[4];
char field3[4];
char field4[4];
char field5[4];
char field6[4];
char field7[4];
char field8[4];
char field9[4];
char field10[4];
char field11[4];
char field12[4];
} monthFields;
splitField inputFields[12] = {
{monthFields.field1, sizeof(monthFields.field1)},
{monthFields.field2, sizeof(monthFields.field2)},
{monthFields.field3, sizeof(monthFields.field3)},
{monthFields.field4, sizeof(monthFields.field4)},
{monthFields.field5, sizeof(monthFields.field5)},
{monthFields.field6, sizeof(monthFields.field6)},
{monthFields.field7, sizeof(monthFields.field7)},
{monthFields.field8, sizeof(monthFields.field8)},
{monthFields.field9, sizeof(monthFields.field9)},
{monthFields.field10, sizeof(monthFields.field10)},
{monthFields.field11, sizeof(monthFields.field11)},
{monthFields.field12, sizeof(monthFields.field12)}
};
int expected=sizeof(inputFields)/sizeof(splitField);
printf("input data: %s\n", input);
printf("expecting %d fields\n",expected);
int ct=strsplit(inputFields, expected, input, fieldSeparator, monthSplitSoftError);
if (ct!=expected) {
printf("string split %d fields, expected %d\n", ct,expected);
}
for (int i=0;i<expected;++i) {
printf("field %d: %s\n",i+1,inputFields[i].field);
}
printf("\n");
printf("Direct structure access, field 10: %s", monthFields.field10);
}
下面是一個示例編譯和輸出。 請注意,在我的示例中,我特意拼出了“APRIL”,以便您可以看到軟錯誤是如何工作的。
$ gcc strsplitExample.c && ./a.out
input data: JAN,FEB,MAR,APRIL,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC,FOO,BAR
expecting 12 fields
monthSplit: input field #4 is 5 bytes, expected 3 bytes
field 1: JAN
field 2: FEB
field 3: MAR
field 4: APR
field 5: MAY
field 6: JUN
field 7: JUL
field 8: AUG
field 9: SEP
field 10: OCT
field 11: NOV
field 12: DEC
Direct structure access, field 10: OCT
享受!
這是另一個實現,它將安全地操作以標記與問題中請求的原型匹配的字符串文字,返回分配的指向 char 的指針(例如char **
)。 分隔符字符串可以包含多個字符,輸入字符串可以包含任意數量的標記。 所有分配和重新分配都由malloc
或realloc
處理,沒有 POSIX strdup
。
分配的初始指針數由NPTRS
常量控制,唯一的限制是它大於零。 返回的char **
在類似於*argv[]
的最后一個標記之后包含一個標記NULL
,並且采用execv
、 execvp
和execve
可用的形式。
與strtok()
一樣,多個連續分隔符被視為單個分隔符,因此"JAN,FEB,MAR,APR,MAY,,,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
將被解析為好像只有一個單個','
分隔"MAY,JUN"
。
下面的函數是在線注釋的,並且添加了一個簡短的main()
來分割月份。 分配的初始指針數設置為2
,以在對輸入字符串進行標記期間強制進行三個重新分配:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NPTRS 2 /* initial number of pointers to allocate (must be > 0) */
/* split src into tokens with sentinel NULL after last token.
* return allocated pointer-to-pointer with sentinel NULL on success,
* or NULL on failure to allocate initial block of pointers. The number
* of allocated pointers are doubled each time reallocation required.
*/
char **strsplit (const char *src, const char *delim)
{
int i = 0, in = 0, nptrs = NPTRS; /* index, in/out flag, ptr count */
char **dest = NULL; /* ptr-to-ptr to allocate/fill */
const char *p = src, *ep = p; /* pointer and end-pointer */
/* allocate/validate nptrs pointers for dest */
if (!(dest = malloc (nptrs * sizeof *dest))) {
perror ("malloc-dest");
return NULL;
}
*dest = NULL; /* set first pointer as sentinel NULL */
for (;;) { /* loop continually until end of src reached */
if (!*ep || strchr (delim, *ep)) { /* if at nul-char or delimiter char */
size_t len = ep - p; /* get length of token */
if (in && len) { /* in-word and chars in token */
if (i == nptrs - 1) { /* used pointer == allocated - 1? */
/* realloc dest to temporary pointer/validate */
void *tmp = realloc (dest, 2 * nptrs * sizeof *dest);
if (!tmp) {
perror ("realloc-dest");
break; /* don't exit, original dest still valid */
}
dest = tmp; /* assign reallocated block to dest */
nptrs *= 2; /* increment allocated pointer count */
}
/* allocate/validate storage for token */
if (!(dest[i] = malloc (len + 1))) {
perror ("malloc-dest[i]");
break;
}
memcpy (dest[i], p, len); /* copy len chars to storage */
dest[i++][len] = 0; /* nul-terminate, advance index */
dest[i] = NULL; /* set next pointer NULL */
}
if (!*ep) /* if at end, break */
break;
in = 0; /* set in-word flag 0 (false) */
}
else { /* normal word char */
if (!in) /* if not in-word */
p = ep; /* update start to end-pointer */
in = 1; /* set in-word flag 1 (true) */
}
ep++; /* advance to next character */
}
return dest;
}
int main (void) {
char *str = "JAN,FEB,MAR,APR,MAY,,,JUN,JUL,AUG,SEP,OCT,NOV,DEC",
**tokens; /* pointer to pointer to char */
if ((tokens = strsplit (str, ","))) { /* split string into tokens */
for (char **p = tokens; *p; p++) { /* loop over filled pointers */
puts (*p);
free (*p); /* don't forget to free allocated strings */
}
free (tokens); /* and pointers */
}
}
示例使用/輸出
$ ./bin/splitinput
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
NOV
DEC
如果您還有其他問題,請告訴我。
#include <cstring>
#include <cstdio>
int main()
{
char buf[] = "This is Luke Skywalker here!";
for( char* tok = strtok( buf, " ");
tok != nullptr;
tok = strtok( nullptr, " ")) {
puts( tok);
}
}
輸出
This
is
Luke
Skywalker
here!
我試着做一個非常簡單的。 我還在 main() 中展示了示例。
#include <stdio.h>
#include <string.h>
void split(char* inputArr, char** outputArr, char* delim) {
char *temp;
temp = strtok(inputArr, delim);
for(int i = 0; temp != NULL; i++) {
outputArr[i] = temp;
temp = strtok(NULL, delim);
}
}
int main(int argc, char **argv){
/* check for proper arguments */
if(argc != 2){
printf("One Argument Expected\n");
} else {
printf("\n");
/*---------main code starts here----------*/
FILE * myScriptFile;
myScriptFile = fopen(argv[1], "r");
/* read txt file and split into array like java split() */
int bufferLen = 100;
char buffer[bufferLen];
char *splitArr[100];
while(fgets(buffer, bufferLen, myScriptFile) != NULL){
split(buffer, splitArr, " ");
printf("Index 0 String: %s\n", splitArr[0]);
printf("Index 1 String: %s\n", splitArr[1]);
printf("Index 2 String: %s\n", splitArr[2]);
printf("Index 3 String: %s\n", splitArr[3]);
}
fclose(myScriptFile);
}
printf("\nProgram-Script Ended\n");
return 0;
}
假設一個 .txt 文件有
Hello this is test
Hello2 this is test2
使用 .txt 文件作為參數運行它會給出
Index 0 String: Hello
Index 1 String: this
Index 2 String: is
Index 3 String: test
Index 0 String: Hello2
Index 1 String: this
Index 2 String: is
Index 3 String: test2
遇到這個尋找一個簡單的解決方案。 我對所有選項都很着迷,但對我自己的用例/品味不滿意(這可能很糟糕)。
我創建了一個有點獨特的解決方案,旨在為其用戶提供清晰的行為,而不是重新分配任何內存,並且是人類可讀的 + 帶有注釋。
在這里上傳到 gist.github: https ://gist.github.com/RepComm/1e89f7611733ce0e75c8476d5ef66093
例子:
#include "./strutils.c"
struct str_split_info info;
info.source = " SPLIT ME hello SPLIT ME world SPLIT ME whats SPLIT ME going SPLIT ME on SPLIT ME today";
info.delimiter = " SPLIT ME ";
str_split_begin(&info);
char * substr;
for (int i=0; i<info.splitStringsCount; i++) {
substr = info.splitStrings[i];
printf("substring: '%s'\n", substr);
}
str_split_end(&info);
輸出:
$ ./test
substring: ''
substring: 'hello'
substring: 'world'
substring: 'whats'
substring: 'going'
substring: 'on'
substring: 'today'
strutils.c 的完整源代碼
#ifndef STRUTILS_C
#define STRUTILS_C 1
#ifndef str
#define str char *
#endif
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <stdio.h>
struct str_split_info {
/* The string to be split
* Provided by caller of str_split_begin function
*/
str source;
/* The string that cuts the source string, all occurances of
* this string will be removed from the source string
* Provided by caller of str_split_begin function
*/
str delimiter;
/* Array of strings split by delimiter
* Provided and allocated by str_split_begin function
* Must be garbage collected by str_split_end function
*/
str * splitStrings;
/* Array of string lengths split by delimiter
* Provided and allocated by str_split_begin function
* Must be garbage collected by str_split_end function
*/
int * splitStringsLengths;
/* Number of strings split by delimiter contained in splitStrings
* Provided by str_split_begin function
*/
int splitStringsCount;
};
#define str_split_infop struct str_split_info *
/* Split a string by a delimiting string
*
* The caller is responsible only for calling str_split_end
* when finished with the results in 'info'
*/
void str_split_begin (str_split_infop info) {
info->splitStringsCount = 0;
int sourceLength = strlen(info->source);
int sourceOffset = 0;
char sourceChar;
int delimiterLength = strlen(info->delimiter);
int delimiterOffset = 0;
char delimiterChar;
//first pass, simply count occurances so we can allocate only once
for (sourceOffset = 0; sourceOffset<sourceLength; sourceOffset++) {
sourceChar = info->source[sourceOffset];
delimiterChar = info->delimiter[delimiterOffset];
if (sourceChar == delimiterChar) {
delimiterOffset++;
if (delimiterOffset >= delimiterLength) {
delimiterOffset = 0;
//increment count
info->splitStringsCount ++;
}
} else {
delimiterOffset = 0;
}
}
info->splitStringsCount++;
//allocate arrays since we know the count
//this one is an array of strings, which are each char arrays
info->splitStrings = (str *) malloc(sizeof (str *) * info->splitStringsCount);
//this one is an array of ints
info->splitStringsLengths = (int*) malloc(sizeof(int) *info->splitStringsCount);
int stringBegin = 0;
int stringEnd = 0;
int splitIndex = 0;
int splitLength = 0;
//second pass, fill the arrays
for (sourceOffset = 0; sourceOffset<sourceLength; sourceOffset++) {
sourceChar = info->source[sourceOffset];
delimiterChar = info->delimiter[delimiterOffset];
if (sourceChar == delimiterChar) {
delimiterOffset++;
//if we've reached the end of the delimiter
if (delimiterOffset >= delimiterLength) {
//don't worry about delimiter trailing null, strlen doesn't count those
stringEnd = sourceOffset - delimiterLength;
//char count of substring we want to split
splitLength = stringEnd - stringBegin + 1;
//allocate for our substring split
info->splitStrings[splitIndex] = (str) malloc(
//+1 for trailing null for c-string
sizeof(char) * splitLength + 1
);
//copy substring from source into splitStrings array
memcpy(
info->splitStrings[splitIndex],
info->source + stringBegin,
splitLength
);
//explicitly set the last char of this split to a NULL just for fun
info->splitStrings[splitIndex][splitLength] = 0x00;
//conveniently put the substring split size for the
//user of str_split_begin :)
info->splitStringsLengths[splitIndex] = splitLength;
//move to next split index
splitIndex ++;
//reset delimiter offset so we look for new occurances of it
delimiterOffset = 0;
//next substring split should occur after the current delimiter
stringBegin = sourceOffset+1;
}
} else {
//reset delimiter offset so we look for new occurances of it
delimiterOffset = 0;
}
}
//handle edge case of last substring after last delimiter
if (stringEnd != stringBegin) {
stringEnd = sourceLength-1;
splitLength = stringEnd - stringBegin + 1;
//allocate for our substring split
info->splitStrings[splitIndex] = (str) malloc(
//+1 for trailing null for c-string
sizeof(char) * splitLength + 1
);
//copy substring from source into splitStrings array
memcpy(
info->splitStrings[splitIndex],
info->source + stringBegin,
splitLength
);
}
}
int str_split_count (str_split_infop info) {
return info->splitStringsCount;
}
void str_split_get (str_split_infop info, str * out) {
for (int i=0; i < info->splitStringsCount; i++) {
strcpy(out[i], info->splitStrings[i]);
}
}
void str_split_end (str_split_infop info) {
if (info->splitStringsCount > 0 && info->splitStrings != NULL) {
//free each string allocated
for (int i=0; i < info->splitStringsCount; i++) {
free(info->splitStrings[i]);
}
//free string array pointer
free (info->splitStrings);
//free string lengths array pointer
free(info->splitStringsLengths);
info->splitStringsCount = 0;
}
}
void str_split_test () {
char * source = "hello world this is a test";
str delimiter = " ";
struct str_split_info info;
info.source = source;
info.delimiter = delimiter;
str_split_begin (&info);
//iterate thru split substrings
//NOTE: removed/memory cleanup after str_split_end
for (int i=0; i<info.splitStringsCount; i++) {
// info.splitStrings[i];
}
str_split_end(&info);
}
#endif
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.