简体   繁体   中英

C programming in Linux: not getting correct output for program that finds number of occurrences of substring in file

I am writing a program that finds the number of occurrences of input substrings from the command line inside a text file (also read from the command line) which is written into a buffer.

When I run the code in bash, I get the error: Segmentation fault (core dumped). I am still learning how to code with C in this environment and have some sort of idea as to why the segmentation fault occurred (misuse of dynamic memory allocation?), but I could not find the problem with it. All I could conclude was that the problem is coming from within the for loop (I labeled where the potential error is being caused in the code).

EDIT: I managed to fix the segmentation fault error by changing argv[j] to argv[i] , however when I run the code now, count1 always returns 0 even if the substring occurs multiple times in the text file and I am not sure what is wrong even though I have gone through the code multiple times.

$ more foo.txt

aabbccc

$ ./main foo.txt a

0

#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char *argv[]) {

    FILE *fp; 
    long lsize; 
    char *buf;
    int count = 0, count1 = 0; 
    int i, j, k, l1, l2;   

    if (argc < 3) { printf("Error: insufficient arguments.\n"); return(1); };

    fp = fopen(argv[1], "r"); 

    if (!fp) { 
        perror(argv[1]); 
        exit(1); 
    }

    //get size of file 
    fseek(fp, 0L, SEEK_END);
    lsize = ftell(fp); 
    rewind(fp);

    //allocate memory for entire content
    buf = calloc(1, lsize+1);

    if (!buf) { 
        fclose(fp); 
        fputs("Memory alloc fails.\n", stderr); 
        exit(1); 
    }

    //copy the file into the buffer
    if (1 != fread(buf, lsize, 1, fp)) {
        fclose(fp);
        free(buf); 
        fputs("Entire read fails.\n", stderr); 
        exit(1); 
    }

    l1 = strlen(buf);

    //error is somewhere here
    for (i = 2; i < argc; i++) {
        for (j = 0; j < l1;) {
            k = 0; 
            count = 0; 
            while ((&buf[j] == argv[k])) {
                count++;
                j++; 
                k++; 
            }
            if (count == strlen(argv[j])) {
                count1++; 
                count = 0; 
            }
            else
                j++; 
        }
        printf("%d\n", count1);
    }

    fclose(fp); 

    return 0; 
}

fread(buf, lsize, 1, fp) will read 1 block of lsize bytes, however fread doesn't care about the contents and won't add a '\\0' -terminating byte for the string, so l1 = strlen(buf); yields undefined behaviour, the rest of the result can be ignored as a result of this (and your counting has errors as well). Note that files usually don't have a 0-terminating byte at the end, that applies even for files containing text, they usually end with a newline.

You have to set the 0-terminating byte yourself:

if (1 != fread(buf, lsize, 1, fp)) {
    fclose(fp);
    free(buf); 
    fputs("Entire read fails.\n", stderr); 
    exit(1); 
}

buf[lsize] = '0';

And you can use strstr to get the location of the substring, like this:

for(i = 2; i < argc; ++i)
{
    char *content = buf;
    int count = 0;

    while((content = strstr(content, argv[i])))
    {
        count++;
        content++; // point to the next char in the substring
    }

    printf("The substring '%s' appears %d time(s)\n", argv[i], count);

}

Your counting is wrong, there are some errors. This comparison

&buf[j] == argv[k]

is wrong, you are comparing pointers, not the contents. You have to use strcmp to compare strings. In this case you would have to use strncmp because you only want to match the substring:

while(strncmp(&buf[j], argv[k], strlen(argv[k])) == 0)
{
    // substring matched
}

but this is also wrong, because you are incrementing k as well, which will give you the next argument, at the end you might read beyond the limits of argv if the substring is longer than the number of arguments. Based on your code, you would have to compare characters:

while(buf[j] == argv[i][k])
{
    j++;
    k++;
}

You would have to increment the counter only when a substring is matched, like this:

l1 = strlen(buf);

for (i = 2; i < argc; i++) {
    int count = 0;
    int k = 0; // running index for inspecting argv[i]
    for (j = 0; j < l1; ++j) {
        while(buf[j + k] == argv[i][k])
            k++;

        // if all characters of argv[i] 
        // matched, argv[i][k] will be the
        // 0-terminating byte
        if(argv[i][k] == 0)
            count++;

        // reset running index for argv[i]
        // go to next char if buf
        k = 0;
    }

    printf("The substring '%s' appears %d time(s)\n", argv[i], count);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM