简体   繁体   中英

C Program to count substrings

Why does my program always skip the last substring count?

eg1. String: dbdbsnasdb dbdxx

Substring: db

count: 4 (no error)

eg2. String: dbdbsnasmfdb

Substring: db

count: 2 (supposed to be 3)

** #include <stdio.h> only

int countSubstr(char string[], char substring[]) {
    int i, j;
    int count = 0;
    int subcount = 0;
    for (i = 0; i <= strlen(string);) {
        j = 0;
        count = 0;
        while ((string[i] == substring[j])) {
            count++;
            i++;
            j++;
        }
        if (count == strlen(substring)) {
            subcount++;
            count = 0;
        } else
            i++;
    }
    return subcount;
}

and why must I declare my j and count to be 0 in the for loop? is it because j has to remain as 0 (substring remains the same) whenever it loops?

  1. Your inner loop ( while ) can continue to compare well past the null terminators in either strings. You need to stop it as soon as one of the strings reach their terminating null character.
  2. Your outer loop condition has an off-by-one error. But you don't need strlen call anyway. Just iterate until the null character.
  3. You can also move the strlen(substring) outside the loop to avoid potentially recalculating it.

A better version might look like:

int countSubstr(char string[], char substring[])
{
    int subcount = 0;
    size_t sub_len = strlen(substring);
    if (!sub_len) return 0;

    for (size_t i = 0;string[i];) {
        size_t j = 0;
        size_t count = 0;
        while (string[i] && string[j] && string[i] == substring[j]) {
            count++;
            i++;
            j++;
        }
        if (count == sub_len) {
            subcount++;
            count = 0;
        }
        else {
            i = i - j + 1; /* no match, so reset to the next index in 'string' */
        }
    }
    return subcount;
}

There are some issues in your code:

  • The loop for (i = 0; i <= strlen(string);) recomputes the length of the string once per iteration of the loop and you iterate one time too far. You should instead write: for (i = 0; string[i] != '\\0';)

  • The second loop may run beyond the end of the string , and produce a value of count that is loo large: it will produce at least 3 for the second example as the null terminator is counted in all cases. This explains why you get an incorrect count of matches. The behavior is actually undefined as you are reading beyond the end of both strings.

Here is an corrected version:

int countSubstr(char string[], char substring[]) {
    int len = strlen(string);
    int sublen = strlen(substring);
    int i, j, count = 0;
    for (i = 0; i <= len - sublen; i++) {
        for (j = 0; j < sublen && string[i + j] == substring[j]; j++)
            continue;
        if (j == sublen)
            count++;
    }
    return count;
}

Note that the number of occurrences of the empty string in any given string will come out as one plus the length of the string, which does make sense.

Note also that this code returns 2 for countSubstr("bbb", "bb") which may of may not be what you expect. The accepted answer returns 1, which is arguable.

This works for all edge-cases I tested

#include <stdio.h>

int countSubstr(char string[], char substring[])
{
    int count = 0;
    size_t i = 0;
    while(string[i])
    {
        int match = 1;
        size_t j = 0;
        while (substring[j])
        {
            match &= substring[j] == string[i + j];
            j++;
        }
        count += match;
        i++;
    }
    return count;
}

Here are some test cases:

void test(char name[], int expected, char string[], char substring[]){
    int actual = countSubstr(string, substring);
    char* status = (actual == expected)? "PASS" : "FAIL";
    printf("%s: %s\nActual: %d\nExpected: %d\n\n",name,status,actual,expected);
}

int main(void) {
    test("Two empty strings", 0, "", "");
    test("Empty substring", 19, "sub str sub str sub", "");
    test("Empty string", 0, "", "sub");
    test("Case 1", 4, "dbdbsnasdb dbdxx", "db");
    test("Case 2", 3, "dbdbsnasmfdb", "db");
    test("No match", 0, "dbdbsnasmfdb", "dxb");
    test("Inner matching", 3, "abababa", "aba");
    test("Identity test", 1, "a", "a");
    return 0;
}

In your while loop, you haven't check if you get past the string length.

edit:
Remember that in C all string have a '\\0' at the end but in your while loop you don't check it.

On your particular example we get (starting at last db):
i = 10, j = 0, count = 1 (check for 'd')
i = 11, j = 1, count = 2 (check for 'b')
i = 12, j = 2, count = 3 (check for '\\0')
i = 13, j = 3, count = 3 (exit loop)

count = 3 is different than strlen(substring) == 2
-> no increase on subcount

for (i = 0; i <= strlen(string);)

Must be

for (i = 0; i < strlen(string);)

Use two for instead a very complex loop becouse it's more easy to debug .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM