简体   繁体   English

是否可以仅在不重叠的字符串上使用strtok函数?

[英]Is it possible to use strtok function on only non-overlapping strings?

In the code below there are two instances of the substring "on" in the string data. 在下面的代码中,字符串数据中有两个子字符串“ on”的实例。 But is it possible to apply strtok only on the the substring "on" which is non-overlapping (ie is not part of another word)? 但是有可能仅将strtok应用于不重叠的子字符串“ on”(即不属于另一个单词的一部分)吗? If yes, can please someone tell me how to and what I am doing wrong in the code below? 如果是,请在下面的代码中告诉我如何做以及我做错了什么?

#include<stdio.h>
#include<string.h>
#include<ctype.h>

int main()
{  
  char data[50]="Jason could you please turn on the TV";
  char delimiter[5]="on";

  char *ptr,*pointer,*pa,*p,*pb[10];
  int i=0,j=0,k=0,count=0;

  p=data;
  pointer=data;

  while((*pointer!='\0')&&(pointer=strstr(pointer,delimiter)))
  {
    pa=pointer+strlen(delimiter);
    ptr=(--pointer);

    while((isspace(*ptr))&&(isspace(*pa)))
    {
      pb[count]=strtok(ptr,delimiter);
      printf("%s\n",pb[count]);
      count++;
      break;

     } 

      pointer++;
     (*pointer)++;

  }   


}

strspn and strcspn can be used to parse a string for a matching word. strspnstrcspn可以用于解析匹配单词的字符串。
strtok will split the string at each occurrence of the individual characters in the delimiter. 每次在分隔符中出现单个字符时, strtok都会分割字符串。 This is not well suited for what you appear to want to do. 这不太适合您似乎想做的事情。

#include <stdio.h>
#include <string.h>

int main() {
    char data[50]="Jason could you please turn on the TV";
    char delimiter[5]="on";
    char *parse = data;
    size_t space = 0;
    size_t span = 0;

    while ( *parse){//parse not pointing to zero terminator
        space = strspn ( parse, " \n\t");//leading whitespace
        parse += space;//advance past whitespace
        span = strcspn ( parse, " \n\t");//not whitespace
        if ( span) {
            printf("word is: %.*s\n", (int)span, parse);//prints span number of characters
        }
        if ( 0 == strncmp ( delimiter, parse, span)) {
            printf ( "\tword matches delimiter: %s\n", delimiter);//found match
        }
        parse += span;//advance past non whitespace for next word
    }
    return 0;
}

EDIT: 编辑:

#include <stdio.h>
#include <string.h>

int main() {
    char data[50]="Jason could you please turn on the TV";
    char delimiter[5]="on";
    char *parse = data;
    size_t space = 0;
    size_t span = 0;

    while ( *parse){//parse not pointing to zero terminator
        space = strspn ( parse, " \n\t");//leading whitespace
        parse += space;//advance past whitespace
        span = strcspn ( parse, " \n\t");//not whitespace
        if ( span) {
            printf("word is: %.*s\n", (int)span, parse);//prints span number of characters
            if ( 0 == strncmp ( delimiter, parse, span)) {
                printf ( "\tword matches delimiter: %s\n", delimiter);//found match
                *parse = 0;
                parse += span;
                space = strspn ( parse, " \n\t");//leading whitespace
                parse += space;
                break;
            }
        }
        parse += span;//advance past non whitespace for next word
    }
    printf ( "\n\nsplit strings:\n%s\n%s\n", data, parse);
    return 0;
}

The basis could be wrapped in a function. 基础可以包装在函数中。 This would divide the original string into as many sub-strings as required by the delimiting word. 这会将原始字符串分为分隔字词所需的尽可能多的子字符串。 None of the sub-strings are stored but with modification that can be achieved. 没有存储任何子字符串,但可以进行修改。

#include <stdio.h>
#include <string.h>

char *strwordsep ( char *str, char *word, size_t *stop) {
    char *parse = str;
    size_t space = 0;
    size_t span = 0;

    while ( *parse){//parse not pointing to zero terminator
        space = strspn ( parse, " \n\t");//leading whitespace
        parse += space;//advance past whitespace
        span = strcspn ( parse, " \n\t");//not whitespace
        if ( span) {
            // printf("word is: %.*s\n", (int)span, parse);//prints span number of characters
            if ( 0 == strncmp ( word, parse, span)) {
                // printf ( "\tword matches delimiter: %s\n", word);//found match
                // *parse = 0;//zero terminate
                *stop = parse - str;
                parse += span;//advance past delimiter
                space = strspn ( parse, " \n\t");//leading whitespace
                parse += space;//advance past whiteespace
                return parse;
            }
        }
        parse += span;//advance past non whitespace for next word
    }
    return NULL;
}

int main() {
    char data[]="Jason, I am on the phone, could you please turn on the TV";
    char word[5]="on";
    char *lead = data;
    char *trail = data;
    size_t stop = 0;
    while ( ( trail = strwordsep ( lead, word, &stop))) {
        printf ( "\nsplit strings:\n%.*s\n", (int)stop, lead);
        lead = trail;
    }
    if ( *lead) {
        printf ( "\nsplit strings:\n%s\n", lead);
    }
    return 0;
}

Edit 编辑

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *strwordsep ( char *str, char *word, size_t *stop) {
    char *parse = str;
    size_t space = 0;
    size_t span = 0;

    while ( *parse){//parse not pointing to zero terminator
        space = strspn ( parse, " \n\t");//leading whitespace
        parse += space;//advance past whitespace
        span = strcspn ( parse, " \n\t");//not whitespace
        if ( span) {
            // printf("word is: %.*s\n", (int)span, parse);//prints span number of characters
            if ( 0 == strncmp ( word, parse, span)) {
                // printf ( "\tword matches delimiter: %s\n", word);//found match
                // *parse = 0;//zero terminate
                *stop = parse - str;
                parse += span;//advance past delimiter
                space = strspn ( parse, " \n\t");//leading whitespace
                parse += space;//advance past whiteespace
                return parse;
            }
        }
        parse += span;//advance past non whitespace for next word
    }
    return NULL;
}

char **freelines ( char **ppc) {
    int each = 0;
    while ( ppc[each]) {//loop until sentinel NULL
        free ( ppc[each]);//free memory
        each++;
    }
    free ( ppc);//free pointers
    return NULL;
}

char **addline ( char **ppc, int *lines, char *add, int length) {
    char **temp = NULL;
    if ( ( temp = realloc ( ppc, sizeof ( *temp) * ( *lines + 2)))) {//add pointer
        ppc = temp;//assign reallocated pointer to original
        if ( ( ppc[*lines] = malloc ( length + 1))) {//allocate memory to pointer
            strncpy ( ppc[*lines], add, length);//copy lenght characters to pointer
            ppc[*lines][length] = 0;
        }
        else {
            fprintf ( stderr, "problem malloc\n");
            ppc = freelines ( ppc);//release memory
            return ppc;
        }
        ppc[*lines + 1] = NULL;//sentinel NULL
        *lines = *lines + 1;
    }
    else {
        fprintf ( stderr, "problem realloc\n");
        ppc = freelines ( ppc);//release memory
        return ppc;
    }
    return ppc;
}

void showlines ( char **ppc) {
    int each = 0;
    while ( ppc[each]) {
        printf ( "output[%d]= %s\n", each, ppc[each]);
        each++;
    }
}

int main() {
    char data[]="Jason, I am on the phone, could you please turn on the TV";
    char word[5]="on";
    char **output = NULL;//pointer to pointer to store sub-strings
    char *lead = data;
    char *trail = data;
    int lines = 0;
    size_t stop = 0;
    while ( ( trail = strwordsep ( lead, word, &stop))) {
        if ( ! ( output = addline ( output, &lines, lead, (int)stop))) {
            return 0;
        }
        lead = trail;
    }
    if ( *lead) {
        if ( ! ( output = addline ( output, &lines, lead, (int)strlen ( lead)))) {
            return 0;
        }
    }
    showlines ( output);
    output = freelines ( output);
    return 0;
}

It's not entirely clear from your use of "non-overlapping" what your intent with data is, but I take it from your additional comments that you want to find "on" within data as a whole-word and not the "on" as part of "Jason" . 您对“非重叠”的使用并不完全清楚您的data意图是什么,但是我从您的其他注释中得出,您希望在data中将"on"作为一个整体单词而不是"on"作为"Jason"一部分。

When attempting to locate "on" within data , you don't need strtok, strspn or strcspn , the correct tool for the job is strstr which allows you to find the first occurrence of a substring within a string. 尝试在data定位"on" ,不需要strtok, strspnstrcspn ,此作业的正确工具是strstr ,它使您可以找到字符串strcspn字符串的首次出现。 You only job is identifying the correct substring to search for. 您唯一的工作就是识别要搜索的正确子字符串。

Since in this case you want to fine "on" as a whole-word, why not search for " on" to locate "on" preceded by a space. 由于在这种情况下,您希望将"on"作为一个完整的单词进行细化,因此为什么不搜索" on"以在"on"之前"on"一个空格。 (you could expand that to all whitespace as well, but for purpose of your sentence we will use space separated words and then expand the check for all whitespace to ensure what follows "on" is whitespace). (您也可以将其扩展到所有空格,但是出于句子目的,我们将使用空格分隔的单词,然后将所有空格的检查范围扩大,以确保"on"后面是空白)。

First, related to your initialization of data , unless you intend to append to your string within your code, there is no need to specify the magic-number 50 , simply leave the [] empty and data will be size appropriately to hold the string, eg 首先,与data的初始化相关,除非您打算在代码中附加到字符串,否则无需指定幻数 50 ,只需将[]留空, data大小将适当地容纳该字符串,例如

    char data[]="Jason could you please turn on the TV",
        *p = data;    /* pointer to data */

Likewise, unless you plan on changing your delimiter, you can simply use a string-literal , eg 同样,除非计划更改定界符,否则可以简单地使用string-literal ,例如

    const char *delim = " on";

Then to locate " on" within data, all you need is a single call to strstr (p, delim) , and you can make the call within a conditional expression to determine whether it exists, eg 然后,要在数据中定位" on" ,您只需对strstr (p, delim)进行一次调用,即可在条件表达式中进行调用以确定其是否存在,例如

    if ((p = strstr (p, delim))) {
        size_t len = strlen (delim);
        char *next = p + len;
        if (isspace (*next) || ispunct (*next)) {
            printf ("found: '%s' (now what?)\n", ++p);
        }
    }

If it is found, just declare a pointer (or use array indexing with p ) to access the next character after " on" . 如果找到它,只需声明一个指针(或使用带有p数组索引)来访问" on"之后的下一个字符。 You can then test whether what follows " on" is whitespace which confirms you have found your wanted substring. 然后,您可以测试" on"后面是否是空格,以确认您已找到所需的子字符串。 Since you know p points to the space before "on" , you can simply increment the pointer p to point to "on" itself as was done above within the printf statement. 由于您知道p指向"on"之前的space ,因此可以像上面在printf语句中所做的那样简单地将指针p递增为指向"on"本身。 Now what you do with the remainder of the string is up to you. 现在,如何处理字符串的其余部分由您决定。 You have p pointing to the beginning of the string, and next pointing to the whitespace following "on" , so you can trivially copy "on" or nul-terminate at next -- whatever it is you need to do. 你必须p指向字符串的开头,而next指向空白以下"on" ,所以你可以平凡复制"on"NUL,终止next -不管它是什么,你需要做的。

Putting it altogether you would have: 综上所述,您将拥有:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main (void) {

    char data[]="Jason could you please turn on the TV",
        *p = data;
    const char *delim = " on";

    if ((p = strstr (p, delim))) {
        size_t len = strlen (delim);
        char *next = p + len;
        if (isspace (*next) || ispunct (*next)) {
            printf ("found: '%s' (now what?)\n", ++p);
        }
    }

    return 0;
}

Example Use/Output 使用/输出示例

$ ./bin/strstr_on
found: 'on the TV' (now what?)

Look things over and let me know if you have any further questions. 仔细检查一下,如果您还有其他问题,请与我联系。

Finding Multiple "on" In String 在字符串中查找多个"on"

As explained in the comments below, if you have multiple "on" located in your input, all you need to do is fashion the above if statement into a loop and then set p = next; 如以下注释中所述,如果输入中有多个"on" ,则您需要做的就是将上述if语句放入一个循环中,然后设置p = next; at the end of the loop. 在循环的末尾。 For example, the only changes needed to find all substrings beginning with "on" , you could: 例如,查找以"on"开头的所有子字符串所需的唯一更改就是:

    char data[]="Jason could you please turn on the TV on the desk",
    ...
    while ((p = strstr (p, delim))) {
        size_t len = strlen (delim);
        char *next = p + len;
        if (isspace (*next) || ispunct (*next)) {
            printf ("found: '%s' (now what?)\n", ++p);
        }
        p = next;
    }

Use/Output Finding All "on" 使用/输出查找所有"on"

$ ./bin/strstr_on
found: 'on the TV on the desk' (now what?)
found: 'on the desk' (now what?)

Let me know if you have any more questions. 如果您还有其他问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM