简体   繁体   中英

Segmentation fault error in a program for counting no of occurences of a word in a file using threads

So i have the following problem: Implement a program that gets as arguments a file name followed by words. For each word, create a separate thread that counts its appearances in the given file.Print out the sum of the appearances of all words.

my code is:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

When I try to run it I get the segmentation fault error. Why is that? Thank you!

In main() your two i loops are different

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

and then

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

and in this second loop you are referencing threads[argc-2] which was not created in the first loop.

First off, your code is terribly formatted. It's not even consistent. It also does not appear you are compiling with warnings enabled.

If you are a university course and they did not tell you how do format the code and compile with warnings, I strongly suggest you ask your tutors what gives.

If using gcc, add -Wall -Wextra. For coding style, I recommend stealing one either from Linux or FreeBSD . There are various editors which format the code for you, including real editors like vim (which is worth trying out even though it may look harsh).

Your coding style helps you screw yourself over.

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf is unused, which you would learn if you had warnings enabled. read is a bad name.

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

Because you don't return (as you should have) you are setting yourself up for a screwup where you execute code you should not. And guess not, your 'else' clause is deffective. You are missing curly braces, so teh for loop below is executed even if file open operation failed.

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10? Seems like a typo as you likely meant 100. This would not have happened if you used sizeof.

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

It is unclear what are you doing here. It seems you wanted to do strcmp starting from read[0], read 1 and so on. But you read new data which replace stuff in the original buffer and then you advance it by one. This makes no sense whatsoever. Finally, you are doing it wrong anyway. read[j] does not evaluate to an address and once more the compiler would have told you that if you asked it to.

strcmp approach is very bad anyway. Try an approach where you try to match the first character and work from there.

int main(int argc, char* argv[]) {

Standard misplaced '*'. Use char *argv[] instead.

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

Highly not recommended. Validate arguments first and then have a dedicated variable which holds an amount of threads. At this point you can allocate an array.

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

Similarly to threads, save the path somewhere. Referring to it as argv 1 is bad style which will come back to bite you. Using argv for words is fine.

However, this is wrong in general. You setup a local struct and pass it to a thread, then immediately change it . So what happens is at the end of the day all your threads are counting the same word. But the word they were counting changed along the way.

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

Go figure. You did not have a variable which held an amount of threads and this lead to this inconsistency (argc - 1 vs argc - 2).

In general this issue could have been solved for the most part with proper reading of compiler warnings, and avoided in general if basic good practices were employed.

Of course bugs happen regardless of that and in such cases you can at the very least narrow them down.

Finally, few words about the general approach. It is unclear what was the point of the exercise. You actually have to force yourself to use anything more than pthread_create and pthread_join. Let's assume the only requirement is to use threads.

I don't know if they force you to open the file multiple times or what. Opening and reading stuff multiple times is not only wasteful but opens you up for a situation where the file is replaced and some threads open a different one.

An OK solution would open the file once in main. Once open, you would mmap the file and fstat for size. If for some reason you can't use mmap, you would malloc a large enough buffer and read the file.

Then all threads can get an address of that buffer, the word to look for and an address to which they should store the counter (each thread gets a different address).

When all threads exited you loop over to sum the result.

Either way no locking is involved.

reading (upto) 10 characters at a time (probably) will miss some instances of the word being search for.

the strcmp() always starts at the beginning of the 10 characters.

1) need to look for the target word at any location in the file.

2) need to look for the target word at any location in the read-in buffer.

Suggest:

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM