简体   繁体   English

Linux 中的 C 编程:无法为找到文件中子字符串出现次数的程序获得正确的输出

[英]C programming in Linux: not getting correct output for program that finds number of occurrences of substring in file

I am writing a program that finds the number of occurrences of input substrings from the command line inside a text file (also read from the command line) which is written into a buffer.我正在编写一个程序,该程序从写入缓冲区的文本文件(也从命令行读取)中的命令行中查找输入子字符串的出现次数。

When I run the code in bash, I get the error: Segmentation fault (core dumped).当我在 bash 中运行代码时,出现错误:分段错误(核心已转储)。 I am still learning how to code with C in this environment and have some sort of idea as to why the segmentation fault occurred (misuse of dynamic memory allocation?), but I could not find the problem with it.我仍在学习如何在这种环境中使用 C 进行编码,并且对发生分段错误的原因(滥用动态内存分配?)有所了解,但我找不到它的问题。 All I could conclude was that the problem is coming from within the for loop (I labeled where the potential error is being caused in the code).我所能得出的结论是,问题出在 for 循环内部(我标记了代码中潜在错误的产生位置)。

EDIT: I managed to fix the segmentation fault error by changing argv[j] to argv[i] , however when I run the code now, count1 always returns 0 even if the substring occurs multiple times in the text file and I am not sure what is wrong even though I have gone through the code multiple times.编辑:我设法通过将argv[j]更改为argv[i]来修复分段错误错误,但是当我现在运行代码时,即使子字符串在文本文件中多次出现,count1 也总是返回 0 并且我不确定即使我已经多次阅读代码,又有什么问题。

$ more foo.txt

aabbccc

$ ./main foo.txt a

0

#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char *argv[]) {

    FILE *fp; 
    long lsize; 
    char *buf;
    int count = 0, count1 = 0; 
    int i, j, k, l1, l2;   

    if (argc < 3) { printf("Error: insufficient arguments.\n"); return(1); };

    fp = fopen(argv[1], "r"); 

    if (!fp) { 
        perror(argv[1]); 
        exit(1); 
    }

    //get size of file 
    fseek(fp, 0L, SEEK_END);
    lsize = ftell(fp); 
    rewind(fp);

    //allocate memory for entire content
    buf = calloc(1, lsize+1);

    if (!buf) { 
        fclose(fp); 
        fputs("Memory alloc fails.\n", stderr); 
        exit(1); 
    }

    //copy the file into the buffer
    if (1 != fread(buf, lsize, 1, fp)) {
        fclose(fp);
        free(buf); 
        fputs("Entire read fails.\n", stderr); 
        exit(1); 
    }

    l1 = strlen(buf);

    //error is somewhere here
    for (i = 2; i < argc; i++) {
        for (j = 0; j < l1;) {
            k = 0; 
            count = 0; 
            while ((&buf[j] == argv[k])) {
                count++;
                j++; 
                k++; 
            }
            if (count == strlen(argv[j])) {
                count1++; 
                count = 0; 
            }
            else
                j++; 
        }
        printf("%d\n", count1);
    }

    fclose(fp); 

    return 0; 
}

fread(buf, lsize, 1, fp) will read 1 block of lsize bytes, however fread doesn't care about the contents and won't add a '\\0' -terminating byte for the string, so l1 = strlen(buf); fread(buf, lsize, 1, fp)将读取 1 个lsize字节块,但是fread不关心内容并且不会为字符串添加'\\0'终止字节,因此l1 = strlen(buf); yields undefined behaviour, the rest of the result can be ignored as a result of this (and your counting has errors as well).产生未定义的行为,因此可以忽略结果的其余部分(并且您的计数也有错误)。 Note that files usually don't have a 0-terminating byte at the end, that applies even for files containing text, they usually end with a newline.请注意,文件通常在末尾没有以 0 结尾的字节,即使对于包含文本的文件也适用,它们通常以换行符结尾。

You have to set the 0-terminating byte yourself:您必须自己设置 0 终止字节:

if (1 != fread(buf, lsize, 1, fp)) {
    fclose(fp);
    free(buf); 
    fputs("Entire read fails.\n", stderr); 
    exit(1); 
}

buf[lsize] = '0';

And you can use strstr to get the location of the substring, like this:您可以使用strstr来获取子字符串的位置,如下所示:

for(i = 2; i < argc; ++i)
{
    char *content = buf;
    int count = 0;

    while((content = strstr(content, argv[i])))
    {
        count++;
        content++; // point to the next char in the substring
    }

    printf("The substring '%s' appears %d time(s)\n", argv[i], count);

}

Your counting is wrong, there are some errors.你的计数是错误的,有一些错误。 This comparison这个比较

&buf[j] == argv[k]

is wrong, you are comparing pointers, not the contents.错了,你是在比较指针,而不是内容。 You have to use strcmp to compare strings.您必须使用strcmp来比较字符串。 In this case you would have to use strncmp because you only want to match the substring:在这种情况下,您将不得不使用strncmp因为您只想匹配子字符串:

while(strncmp(&buf[j], argv[k], strlen(argv[k])) == 0)
{
    // substring matched
}

but this is also wrong, because you are incrementing k as well, which will give you the next argument, at the end you might read beyond the limits of argv if the substring is longer than the number of arguments.但这也是错误的,因为您也在增加k ,这将为您提供下一个参数,如果子字符串长于参数数量,最后您可能会超出argv的限制。 Based on your code, you would have to compare characters:根据您的代码,您必须比较字符:

while(buf[j] == argv[i][k])
{
    j++;
    k++;
}

You would have to increment the counter only when a substring is matched, like this:只有在匹配子字符串时才必须增加counter ,如下所示:

l1 = strlen(buf);

for (i = 2; i < argc; i++) {
    int count = 0;
    int k = 0; // running index for inspecting argv[i]
    for (j = 0; j < l1; ++j) {
        while(buf[j + k] == argv[i][k])
            k++;

        // if all characters of argv[i] 
        // matched, argv[i][k] will be the
        // 0-terminating byte
        if(argv[i][k] == 0)
            count++;

        // reset running index for argv[i]
        // go to next char if buf
        k = 0;
    }

    printf("The substring '%s' appears %d time(s)\n", argv[i], count);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM