简体   繁体   English

检索sizeof(buff)时c程序崩溃

[英]c program crashes when retrieving sizeof(buff)

I am creating a program in C that splits a large text file into 10 segments, and then creates 10 threads with each thread generating a word count for each segment. 我正在用C创建一个程序,该程序将一个大型文本文件分成10个段,然后创建10个线程,每个线程为每个段生成一个字数统计。 I took the function word_count from this code: https://github.com/prateek-khatri/seaOfC/blob/master/frequencyMultiThread.c . 我从以下代码word_count函数: https : //github.com/prateek-khatri/seaOfC/blob/master/frequencyMultiThread.c That program works fine for me, but when I tried to use word_count in my own program, it crashes when trying to get the size of the buffer. 该程序对我来说很好用,但是当我尝试在自己的程序中使用word_count时,尝试获取缓冲区大小时会崩溃。

It seems like everything is ok in the function getCurrentSegmentWordcount , but when that function calls word_count , it crashes (segmentation fault) at the line printf("sizeof Buff: %d", sizeof(buff)); 函数getCurrentSegmentWordcount似乎一切正常,但是当该函数调用word_count ,它在printf("sizeof Buff: %d", sizeof(buff));行崩溃(分段错误printf("sizeof Buff: %d", sizeof(buff)); .

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <unistd.h>
#define NUMBER_OF_THREADS 10

//struct taken from reference:
struct return_val{
    char wordlist[100][100]; //[chars][lines]
    int count[100];
} *arr; //array of words

void *print_hello_world(void * tid)
{
    //This function prints the thread’s identifier and then exits.
    printf("Hello World. Greetings from thread %d\n", tid);
    pthread_exit(NULL);
}

void *word_count(void* num)
{ 

    int *ln = num;
    unsigned int line_number = *ln;
    //line_number++;

    printf("Thread %d\n",line_number);

    char cmd_p1[9] = "sed -n '\0";
    char cmd_p2[2];
    sprintf(cmd_p2,"%d",line_number); //stores string in buffer
    char cmd_p3[21] = "p' 'maintainers.txt'\0";
    char command[100];
    command[0] = '\0';

    //char * strcat ( char * destination, const char * source );
    //appends a copy of source to destination
    strcat(command,cmd_p1);
    strcat(command,cmd_p2);
    strcat(command,cmd_p3);
    usleep(line_number);

    char cmd[100] = " | tr [:space:] '\\n' | grep -v '^\\s*$' | sort | uniq -c | sort\0";
    strcat(command,cmd);
    printf("Command: %s\n",command);
    //fflush(stdout);



    FILE *in;
    in= popen(command, "r"); //read command and pipe into the shell
    rewind(in); //set file position to beginning of 'in'
    char buff[50];
    int counter = 0;


    //char * fgets ( char * str, int num, FILE * stream );
    //reads chars from stream and stores them as string into buff until all of buffer has been read
    printf("before\n");
    bool testBool = fgets(buff,sizeof(buff),in);
    printf("testBool: %d\n", testBool);


    //CRASH HAPPENS HERE:
    //buff 
    printf("sizeof Buff: %d", sizeof(buff));


    while(fgets(buff,sizeof(buff),in))
    {
        printf("fire 0.5");
        char c=' ';
        int i = 0;
        int cnt = atoi(buff); //converts string to int.. buff == # of chars in file?
        arr[line_number-1].count[counter] = cnt; //at this point line_number == 1
        printf("fire1\n");

        while(c!='\0')
        {
            c=buff[i];
            buff[i]=buff[i+6];
            i++;
        }


        int cnnt = 0;
        while(c!=' ')
        {
            c = buff[cnnt];
            cnnt++;
        }
        i=0;
        while(c!='\0')
        {
            c=buff[i];
            buff[i]=buff[i+cnnt];
            i++;
        }
        sprintf(arr[line_number-1].wordlist[counter],"%s",buff);
        printf("%d %s",arr[line_number-1].count[counter],arr[line_number-1].wordlist[counter]);
        counter++;
    }
    printf("final count: %d", counter);
    arr[line_number-1].count[counter] = -1;


    fclose(in);



    //pthread_exit(NULL); //didn't help to move here from getCurrentSegment...()
    return NULL;
}



void *getCurrentSegmentWordcount(void * tid) { //declaring file pointer (value?)
    int segment = tid;
    segment = segment + 1; //converts to int
    printf("segment/thread: %d \n", segment);
    char text[1000];
    //char buffer[150];
    FILE *fp = fopen("words.txt", "r");
    if(fp == NULL) {
        printf("null file");
    }
    int i = 0;

    long lSize;
    char *buffer;
    if( !fp ) perror("words.txt"),exit(1);

    fseek( fp , 0L , SEEK_END);
    lSize = ftell( fp );
    rewind( fp );

    buffer = calloc( 1, lSize+1 );
    if( !buffer ) fclose(fp),fputs("memory alloc fails",stderr),exit(1);

    if( 1!=fread( buffer , lSize, 1 , fp) )
      fclose(fp),free(buffer),fputs("entire read fails",stderr),exit(1);

    //printf(buffer);

    char *token = strtok(buffer, "~");

    if(segment == 1) {
        printf("segment 1: %s", token);
        word_count(&segment);
    }

    if(segment == 2) {
        token = strtok(NULL,"~");
        printf("segment 2: %s", token);
    }

    if(segment == 3) {
        token = strtok(NULL,"~");
        token = strtok(NULL,"~");
        printf("segment 3: %s", token);
    }

    if(segment == 4) {
        token = strtok(NULL,"~");
        token = strtok(NULL,"~");
        token = strtok(NULL,"~");
        printf("segment 4: %s", token);
    }

    fclose(fp);
    free(buffer);
    //pthread_exit(NULL);//moving to end of word_count()
}

int main(int argc, char *argv[])
{
    //The main program creates x threads and then exits.
    pthread_t threads[NUMBER_OF_THREADS];
    int status, i;

    for(i=0; i < NUMBER_OF_THREADS; i++) {
        printf("Main here. Creating thread %d\n", i+1);
        status = pthread_create(&threads[i], NULL, getCurrentSegmentWordcount, (void * )i);
        if (status != 0) {
            printf("Oops. pthread create returned error code %d\n", status);
            exit(-1);
        }
    }
    sleep(8);
    exit(NULL);
}

Output: 输出:

Main here. Creating thread 1
Main here. Creating thread 2
segment/thread: 1 
Main here. Creating thread 3
segment 1: test(segment 1, handled my thread 1)
Thread 1
Main here. Creating thread 4
Command: sed -n '1p' 'maintainers.txt' | tr [:space:] '\n' | grep -v '^\s*$' | sort | uniq -c | sort
Main here. Creating thread 5
segment/thread: 2 
before
segment/thread: 4 
Main here. Creating thread 6
segment 4: 
test test test test (segment 4, handled by thread 4)
Main here. Creating thread 7
segment 2: 
test test (segment 2, handled by thread 2)
Main here. Creating thread 8
Main here. Creating thread 9
Main here. Creating thread 10
segment/thread: 3 
segment 3: 
test test test (segment 3, handled by thread 3)
segment/thread: 10 
segment/thread: 9 
segment/thread: 8 
segment/thread: 5 
segment/thread: 6 
segment/thread: 7 
testBool: 1
Makefile:20: recipe for target 'all' failed
make: *** [all] Segmentation fault (core dumped)

There are many issues with this code, some have been already mentioned by user3629249 , so I'll try to summarize the errors here. 这段代码有很多问题, user3629249已经提到了一些问题,因此我将在这里尝试总结错误。

Passing (void * )i for the argument for the thread is rather ugly. 为线程的参数传递(void * )i相当难看。 Sure it works but this is for me sloppy programming, I'd declare an int array and fill it with the id values and pass a pointer to the locations. 当然可以,但是对我来说这是草率的编程,我将声明一个int数组,并用id值填充它,然后将一个指针传递给这些位置。

int ids[NUMBER_OF_THREADS];

for(i=0; i < NUMBER_OF_THREADS; i++) {
    ids[i] = i+1;
    status = pthread_create(&threads[i], NULL, getCurrentSegmentWordcount, ids + i);
    ...
}

and then in the thread function: 然后在线程函数中:

void *getCurrentSegmentWordcount(void * tid) { //declaring file pointer (value?)
    int segment = *((int*) tid);
    // segment = segment + 1; not needed anymore
    ...
}

This code is more clean, easier to understand for you and for the code reviewer, does not relay on ugly unnecessary casts and is more portable. 这段代码更干净,更容易为您和代码审阅者理解,不会依赖丑陋的不必要的强制转换,并且更易于移植。

Same thing with 与...相同

void *print_hello_world(void *tid)
{
    //This function prints the thread’s identifier and then exits.
    printf("Hello World. Greetings from thread %d\n", tid);
    pthread_exit(NULL);
}

This is uggly, you are trying to pass a pointer as an int . 这很麻烦,您正在尝试将指针作为int传递。 The size of a pointer may not be the same as the size of an int . 指针的大小可能与int的大小不同。 Using the same way of passing a pointer to int (like for getCurrentSegmentWordcount ): 使用将指针传递给int的相同方法(例如getCurrentSegmentWordcount ):

void *print_hello_world(void *tid)
{
    //This function prints the thread’s identifier and then exits.
    printf("Hello World. Greetings from thread %d\n", *((int*) tid));
    pthread_exit(NULL);
}

Write error messages to stderr . 将错误消息写入stderr This FILE buffer is opened for that reason, that's what people expect from programs to do. 出于这个原因打开了FILE缓冲区,这就是人们期望程序执行的操作。 When you execute a program, you can do this: 执行程序时,可以执行以下操作:

$ program 2>/tmp/error.log

or this

$ program 2>/dev/null | some_other_tool

so that you can separate the normal output from the error outputs. 这样您就可以将正常输出与错误输出分开。

And when a system function fails, the errno variable is set to the error code. 当系统功能失败时, errno变量将设置为错误代码。 You can use perror for a standard error message or if you want a custom one, use strerror : 您可以将perror用作标准错误消息,或者如果需要自定义消息,请使用strerror

pid_t p = fork();

if(p < 0)
{
    perror("fork failed");
    // or
    fprintf(stderr, "Error while executing fork: %s\n", strerror(errno));
    return; // or exit or whatever
}

You can write code in one line if you want to enter the C obfuscated contest, otherwise don't do that. 如果您想参加C混淆竞赛,则可以在一行中编写代码,否则不要这样做。 It's hard to read for you, it's hard to read for the code reviewer/co-worker/superior. 对于您而言,这很难读,对于代码审阅者/同事/高级用户而言,这很难读。 You gain nothing from it. 您不会从中获得任何收益。

Instead of 代替

if( !buffer ) fclose(fp),fputs("memory alloc fails",stderr),exit(1);

do

if(buffer == NULL)
{
    fclose(fp);
    fputs("memory alloc fails", stderr);
    exit(EXIT_FAILURE); // or exit(your_exit_status)
}

It's easier to read for everyone. 每个人都更容易阅读。


You should always check the return value of functions that return a pointer. 您应该始终检查返回指针的函数的返回值。 Check the return value of malloc , calloc , realloc , strtok , etc. 检查malloccallocreallocstrtok等的返回值。

if(segment == 2) {
    token = strtok(NULL,"~");
    printf("segment 2: %s", token);
}

If strtok returns NULL , then the printf line yields undefined behaviour. 如果strtok返回NULL ,则printf行将产生未定义的行为。 See 3.5.3.3 comment 2 : 参见3.5.3.3评论2

3.5.3.3 : 3.5.3.3

Synopsis 概要

  #define __STDC_WANT_LIB_EXT1__ 1 #include <stdio.h> int printf_s(const char * restrict format, ...); 

[...] [...]

2 format shall not be a null pointer. 2格式不得为空指针。 The %n specifier (modified or not by flags, field width, or precision) shall not appear in the string pointed to by format. %n说明符(未通过标志,字段宽度或精度进行修改)不得出现在格式所指向的字符串中。 Any argument to printf_s corresponding to a %s specifier shall not be a null pointer . 对应于%s说明符的printf_s任何参数都不得为空指针

[...] [...]

4 The printf_s function is equivalent to the printf function except for the explicit runtime-constraints listed above. 4除了上面列出的显式运行时约束, printf_s函数与printf函数等效。

Some libc implementation may forgive you to pass NULL to printf with %s and print (null) , but this is not portable and is undefined behaviour. 一些libc实现可能会原谅您将NULL传递给带有%s printf和print (null) ,但这不是可移植的,并且是未定义的行为。 So you can only do the printf if token is not NULL . 因此, 如果token不为NULL则只能执行printf


The word_count function is a little bit horrible, specially how you construct the commands. word_count函数有点可怕,尤其是您如何构造命令。

char cmd_p1[9] = "sed -n '\0";

can be rewritten as 可以改写成

char cmd_p1[] = "sed -n '";

This will create a char array with the correct amount of bytes and initialize it with a valid 0-terminated string, no need to add the ' \\0 ' yourself. 这将创建一个具有正确字节数的char数组,并使用有效的0终止的字符串对其进行初始化,而无需自己添加' \\0 '。

The commands that are the same, meaning that they don't need a value from a variable can be store in a char[] or even in a const char* . 相同的命令,意味着它们不需要变量值,可以存储在char[]const char* Then construct the whole thing with snprintf and sprintf , less lines, less mistakes: 然后用snprintfsprintf构造整个东西,减少行数,减少错误:

void *word_count(void* num)
{
    ...
    const char *pipe_cmd = "| tr [:space:] '\\n' | grep -v '^\\s*$' | sort | uniq -c | sort";
    const char *format = "sed -n '%dp' 'maintainers.txt' %s";

    int cmd_size = snprintf(NULL, 0, format, line_number, pipe_cmd);

    char *command = malloc(cmd_size + 1);
    if(command == NULL)
        return NULL;

    sprintf(command, format, line_number, pipe_cmd);

    ...

    FILE *in;
    in= popen(command, "r");
    free(command);
    ...
}

Also note that 另请注意

char cmd_p2[2];
sprintf(cmd_p2,"%d",line_number); //stores string in buffer

will overflow the buffer if the line number is greater than 9. 如果行号大于9,将使缓冲区溢出。


bool testBool = fgets(buff,sizeof(buff),in);
printf("testBool: %d\n", testBool);

fgets returns a pointer to char , not a bool . fgets返回指向char的指针,而不是bool The printf will print the value of a pointer as an integer. printf将指针的值打印为整数。 A pointer size is not necessarily the same as an int size, in fact on my system a pointer is 8 bytes long, int is 4 bytes long. 指针大小不一定与int大小相同,实际上在我的系统上,指针长8个字节, int长4个字节。 You should do: 你应该做:

if(fgets(buff, sizeof(buff), in))
    puts("fgets success");

//CRASH HAPPENS HERE:
//buff 
printf("sizeof Buff: %d", sizeof(buff));
  1. It won't crash because of the sizeof . 它不会因为sizeof而崩溃。 sizeof is evaluated at compile time, not at run-time. sizeof是在编译时而不是在运行时评估的。
  2. The sizeof -operator returns a size_t . sizeof -operator返回size_t
  3. %d is not the correct specifier for size_t , %lu is, it should be %d不是size_t的正确说明符, %lu是,应该是

     printf("sizeof buff: %lu\\n", sizeof buff); 
  4. It will most probably crash because of all the undefined behaviour before this point. 由于此之前所有未定义的行为,它很可能崩溃。


arr[line_number-1].count[counter] = cnt;

In your whole code, arr is uninitialized, so you are accessing a value through an uninitialized pointer. 在您的整个代码中, arr是未初始化的,因此您正在通过未初始化的指针访问值。 That's undefined behaviour and might lead to a segfault. 这是未定义的行为,可能会导致段错误。


I want to quote user3629249 here: 我想在这里引用user3629249

user3629249 wrote : user3629249写道

the main() function is starting several threads, then immediately exiting. main()函数启动多个线程,然后立即退出。 The process of exiting also eliminates the threads Suggest: in main() calling pthread_join() for each thread. 退出过程还消除了线程建议:在main() pthread_join()为每个线程调用pthread_join() in the thread, at the end, call pthread_exit() 在线程中,最后,调用pthread_exit()


Please don't ignore compiler warnings, they are not there to annoy you, they are there to help you. 请不要忽略编译器警告,它们不是在烦扰您,它们是在帮助您。 They are a hint that what you are doing may not be what you really want. 这些提示表明您在做什么可能不是您真正想要的。 Undefined behaviour, segfaults etc. are often a consequence of that. 不确定的行为,段错误等通常是这种情况的结果。 So heed warnings of the compiler and when you see one, look at your code, try to understand it and fix it. 因此,请注意编译器的警告,当您看到一个警告时,请查看您的代码,尝试理解并修复它。 If you don't understand the warning, you can come here and ask a question about it. 如果您不明白该警告,则可以到这里询问有关此问题的信息。 But having thousand of warning and ignoring them will lead to headaches and quite franky a lot of wasted time on your side and ours. 但是,如果有成千上万的警告而无视它们,则会导致头痛,并且坦率地说,浪费在您和我们这边的时间。

So, please fix all this warnings and details, look at the warning messages of the compiler and the code might run without problems. 因此,请修复所有这些警告和详细信息,查看编译器的警告消息,然后代码可能没有问题地运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM