简体   繁体   English

使用特定的多字节定界符标记字符串

[英]Tokenize a string with a specific multi-byte delimiter

I need to parse a string like this: 我需要解析这样的字符串:

link:a link:blink:c link:d lkjh

The output should be a , blink:c , d 输出应为ablink:cd

But the output using strtok understably is a , b , c , d , jh 但是使用strtok的输出稳定地是abcdjh

How do I make sure only link: explicitly splits the string (avoiding the case of blink:c getting split. Also how do I ensure the last kjh don't appear (k seems to be the delimiter here). 如何确保只有link:明确地拆分了字符串(避免blink:c的情况blink:c被拆分。另外,如何确保最后一个kjh不出现(k似乎是分隔符)。

Firstly, passing a delimiter string to strtok does not do what you seem to think. 首先,将分隔符字符串传递给strtok并没有您所想的。 If you pass "link:" as the delim field, it will use any of those characters as a delimiter. 如果您将"link:"作为delim字段传递,它将使用这些字符中的任何一个作为分隔符。 This is why lkjh is being split and returning jh . 这就是为什么lkjh被拆分并返回jh

You are better off splitting by spaces, and then checking the beginning matches "link:" . 您最好先按空格分割,然后检查开头的匹配项"link:"

const char * delim = " ";
const char * prefix = "link:";
const size_t len_prefix = strlen( prefix );

char * token = strtok( input_string, delim );
while( token != NULL ) {
    if( 0 == strncmp( token, prefix, len_prefix )
    {
        printf( "%s\n", token + len_prefix );
    }
    token = strtok( NULL, delim );
}

If you need something more complicated than this, roll your own or use a regular expression. 如果您需要比这更复杂的内容,请自己滚动或使用正则表达式。

Answering my own question. 回答我自己的问题。

Basically find the first link: and then keep moving forward till the next space/null, and repeat on the string remnant. 基本上找到第一个链接:然后继续前进直到下一个空格/空值,然后在字符串残余上重复。

Better approach is to simply use strstr:- 更好的方法是简单地使用strstr:-

char* parseSubFn(char *string)

{ {

    if(string==NULL || *string=='\0')
    return;

    else{
    //printf("Parsing on %s \n",string);
    const char* needle = "link:";
    char ret[128];//large buffer character for return token
    char *location=strstr(string,needle);
    char *start=location+5;
    int i=0; 
    while(*start!='\0' && *start!='\n' && *start!=' ')
    {
    ret[i]=*start;
    start++;i++;
    }
    start++;
    printf("%s\n",ret);
    parseSubFn(start);
    }

} }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM