程序中的分段错误错误，用于使用线程不计算文件中单词的出现次数

Question

So i have the following problem: Implement a program that gets as arguments a file name followed by words. 因此，我遇到以下问题：实现一个程序，该程序将文件名后跟单词作为参数。 For each word, create a separate thread that counts its appearances in the given file.Print out the sum of the appearances of all words. 为每个单词创建一个单独的线程以计算其在给定文件中的出现次数，并打印出所有单词的出现总数。

my code is: 我的代码是：

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

When I try to run it I get the segmentation fault error. 当我尝试运行它时，出现分段错误错误。 Why is that? 这是为什么？ Thank you! 谢谢！

Answer 1

In main() your two i loops are different 在main()两个i循环不同

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

and then 接着

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

and in this second loop you are referencing threads[argc-2] which was not created in the first loop. 在第二个循环中，您将引用在第一个循环中未创建的threads[argc-2] 。

Answer 2

First off, your code is terribly formatted. 首先，您的代码格式非常糟糕。 It's not even consistent. 这甚至不一致。 It also does not appear you are compiling with warnings enabled. 您在启用警告的情况下进行编译也不会出现。

If you are a university course and they did not tell you how do format the code and compile with warnings, I strongly suggest you ask your tutors what gives. 如果您是一门大学课程，并且他们没有告诉您如何格式化代码并通过警告进行编译，我强烈建议您请教您的导师提供什么。

If using gcc, add -Wall -Wextra. 如果使用gcc，请添加-Wall -Wextra。 For coding style, I recommend stealing one either from Linux or FreeBSD . 对于编码风格，我建议从Linux或FreeBSD中窃取一种。 There are various editors which format the code for you, including real editors like vim (which is worth trying out even though it may look harsh). 有多种编辑器可以为您设置代码格式，包括像vim这样的真正编辑器（即使看起来很苛刻也值得尝试）。

Your coding style helps you screw yourself over. 您的编码风格可帮助您解决问题。

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf is unused, which you would learn if you had warnings enabled. buf未使用，如果启用了警告，您将了解到。 read is a bad name. 读是一个坏名字。

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

Because you don't return (as you should have) you are setting yourself up for a screwup where you execute code you should not. 因为您不返回（应有的状态），所以您将自己设置为执行代码的不正确行为，而不应该这样做。 And guess not, your 'else' clause is deffective. 不用猜测，您的“ else”子句是无效的。 You are missing curly braces, so teh for loop below is executed even if file open operation failed. 您缺少花括号，因此即使文件打开操作失败，也会执行下面的for循环。

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10? 10点 Seems like a typo as you likely meant 100. This would not have happened if you used sizeof. 好像是一个错字，您可能要输入100。如果使用sizeof，则不会发生这种情况。

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

It is unclear what are you doing here. 目前尚不清楚您在这里做什么。 It seems you wanted to do strcmp starting from read[0], read 1 and so on. 似乎您想从read [0]开始，从read 1开始，等等进行strcmp。 But you read new data which replace stuff in the original buffer and then you advance it by one. 但是，您读取了新数据并替换了原始缓冲区中的内容，然后将其前进了一个。 This makes no sense whatsoever. 这毫无意义。 Finally, you are doing it wrong anyway. 最后，无论如何您做错了。 read[j] does not evaluate to an address and once more the compiler would have told you that if you asked it to. read [j]不会求值到一个地址，如果您要求，编译器会再次告诉您。

strcmp approach is very bad anyway. 无论如何，strcmp方法非常糟糕。 Try an approach where you try to match the first character and work from there. 尝试一种尝试匹配第一个字符并从那里开始工作的方法。

int main(int argc, char* argv[]) {

Standard misplaced '*'. 标准放错了“ *”。 Use char *argv[] instead. 使用char * argv []代替。

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

Highly not recommended. 强烈不推荐。 Validate arguments first and then have a dedicated variable which holds an amount of threads. 首先验证参数，然后验证一个专用变量，该变量保存一定数量的线程。 At this point you can allocate an array. 此时，您可以分配一个数组。

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

Similarly to threads, save the path somewhere. 与线程类似，将路径保存在某处。 Referring to it as argv 1 is bad style which will come back to bite you. 将其称为argv 1是不好的风格，这会再次咬住您。 Using argv for words is fine. 对单词使用argv可以。

However, this is wrong in general. 但是，这通常是错误的。 You setup a local struct and pass it to a thread, then immediately change it . 您设置了本地结构并将其传递给线程，然后立即对其进行更改。 So what happens is at the end of the day all your threads are counting the same word. 因此，发生在一天结束时，所有线程都在计数相同的单词。 But the word they were counting changed along the way. 但是他们所指的单词在此过程中发生了变化。

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

Go figure. 去搞清楚。 You did not have a variable which held an amount of threads and this lead to this inconsistency (argc - 1 vs argc - 2). 您没有一个拥有一定数量线程的变量，这导致了这种不一致（argc-1 vs argc-2）。

In general this issue could have been solved for the most part with proper reading of compiler warnings, and avoided in general if basic good practices were employed. 通常，通过正确阅读编译器警告，可以在很大程度上解决此问题，并且如果采用基本的良好实践，通常可以避免此问题。

Of course bugs happen regardless of that and in such cases you can at the very least narrow them down. 当然，无论如何都会发生错误，在这种情况下，您至少可以缩小范围。

Finally, few words about the general approach. 最后，关于通用方法的几句话。 It is unclear what was the point of the exercise. 目前尚不清楚该练习的目的是什么。 You actually have to force yourself to use anything more than pthread_create and pthread_join. 实际上，您必须强迫自己使用除pthread_create和pthread_join之外的任何方法。 Let's assume the only requirement is to use threads. 假设唯一的要求是使用线程。

I don't know if they force you to open the file multiple times or what. 我不知道他们是否强迫您多次打开文件。 Opening and reading stuff multiple times is not only wasteful but opens you up for a situation where the file is replaced and some threads open a different one. 多次打开和读取内容不仅浪费，而且在文件被替换并且某些线程打开另一个文件的情况下使您打开。

An OK solution would open the file once in main. 一个好的解决方案是在主目录中打开一次文件。 Once open, you would mmap the file and fstat for size. 打开后，您可以将文件和fstat映射为大小。 If for some reason you can't use mmap, you would malloc a large enough buffer and read the file. 如果由于某种原因不能使用mmap，则应分配足够大的缓冲区并读取文件。

Then all threads can get an address of that buffer, the word to look for and an address to which they should store the counter (each thread gets a different address). 然后，所有线程都可以获得该缓冲区的地址，要查找的单词以及它们应将计数器存储到的地址（每个线程获得一个不同的地址）。

When all threads exited you loop over to sum the result. 当所有线程退出时，您将循环以求和。

Either way no locking is involved. 无论哪种方式都不会涉及锁定。

Answer 3

reading (upto) 10 characters at a time (probably) will miss some instances of the word being search for. 一次（可能）读取（最多）10个字符将丢失某些正在搜索的单词实例。

the strcmp() always starts at the beginning of the 10 characters. strcmp（）始终从10个字符的开头开始。

1) need to look for the target word at any location in the file. 1）需要在文件中的任何位置查找目标词。

2) need to look for the target word at any location in the read-in buffer. 2）需要在读入缓冲区的任何位置查找目标单词。

Suggest: 建议：

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0

程序中的分段错误错误，用于使用线程不计算文件中单词的出现次数

问题描述

3 个解决方案

解决方案1
2 2015-05-03 19:29:29

解决方案2
1 已采纳 2015-05-03 21:14:29

解决方案3
0 2015-05-03 19:42:53

程序中的分段错误错误，用于使用线程不计算文件中单词的出现次数

问题描述

3 个解决方案

解决方案1 2 2015-05-03 19:29:29

解决方案2 1 已采纳 2015-05-03 21:14:29

解决方案3 0 2015-05-03 19:42:53

解决方案1
2 2015-05-03 19:29:29

解决方案2
1 已采纳 2015-05-03 21:14:29

解决方案3
0 2015-05-03 19:42:53