程序中的分段錯誤錯誤，用於使用線程不計算文件中單詞的出現次數

Question

因此，我遇到以下問題：實現一個程序，該程序將文件名后跟單詞作為參數。 為每個單詞創建一個單獨的線程以計算其在給定文件中的出現次數，並打印出所有單詞的出現總數。

我的代碼是：

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t mtx; // used by each of the three threads to prevent  other threads from accessing global_sum during their additions

int global_sum = 0;
typedef struct{
                    char* word;
                    char* filename;
}MyStruct;



void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str; 
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];
    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else
         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){
             if (strcmp(read[j],struc->word)==0)
                count++;
         }

    printf("the no of words is: %d \n",count);  
    pthread_mutex_lock(&mtx); // lock the mutex, to prevent other threads from accessing global_sum
    global_sum += count; // add thread's count result to global_sum
    pthread_mutex_unlock(&mtx); // unlock the mutex, to allow other threads to access the variable
}


int main(int argc, char* argv[]) {
    int i;
    MyStruct str; 

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array 

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];  
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str); 
    }

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

    printf("The global sum is %d.\n", global_sum); // print global sum

    pthread_mutex_destroy(&mtx); // destroy the mutex

    return 0;

}

當我嘗試運行它時，出現分段錯誤錯誤。 這是為什么？ 謝謝！

Answer 1

在main()兩個i循環不同

for (i=0;i<argc-2;i++){
    ...
    pthread_create(&threads[i], NULL, count, &str); 
}

接着

for (i = 0; i < argc-1; ++i)
    pthread_join(threads[i], NULL);

在第二個循環中，您將引用在第一個循環中未創建的threads[argc-2] 。

Answer 2

首先，您的代碼格式非常糟糕。 這甚至不一致。 您在啟用警告的情況下進行編譯也不會出現。

如果您是一門大學課程，並且他們沒有告訴您如何格式化代碼並通過警告進行編譯，我強烈建議您請教您的導師提供什么。

如果使用gcc，請添加-Wall -Wextra。 對於編碼風格，我建議從Linux或FreeBSD中竊取一種。 有多種編輯器可以為您設置代碼格式，包括像vim這樣的真正編輯器（即使看起來很苛刻也值得嘗試）。

您的編碼風格可幫助您解決問題。

void *count(void*str)
{
    MyStruct *struc;
    struc = (MyStruct*)str;
    const char *myfile = struc->filename;

    FILE *f;
    int count=0, j;
    char buf[50], read[100];

buf未使用，如果啟用了警告，您將了解到。 讀是一個壞名字。

    // myfile[strlen(myfile)-1]='\0';
    if(!(f=fopen(myfile,"rt"))){
         printf("Wrong file name");
    }
    else

因為您不返回（應有的狀態），所以您將自己設置為執行代碼的不正確行為，而不應該這樣做。 不用猜測，您的“ else”子句是無效的。 您缺少花括號，因此即使文件打開操作失敗，也會執行下面的for循環。

         printf("File opened successfully\n");
         for(j=0; fgets(read, 10, f)!=NULL; j++){

10點 好像是一個錯字，您可能要輸入100。如果使用sizeof，則不會發生這種情況。

             if (strcmp(read[j],struc->word)==0)
                count++;
         }

目前尚不清楚您在這里做什么。 似乎您想從read [0]開始，從read 1開始，等等進行strcmp。 但是，您讀取了新數據並替換了原始緩沖區中的內容，然后將其前進了一個。 這毫無意義。 最后，無論如何您做錯了。 read [j]不會求值到一個地址，如果您要求，編譯器會再次告訴您。

無論如何，strcmp方法非常糟糕。 嘗試一種嘗試匹配第一個字符並從那里開始工作的方法。

int main(int argc, char* argv[]) {

標准放錯了“ *”。 使用char * argv []代替。

    int i;
    MyStruct str;

    pthread_mutex_init(&mtx, NULL); // initialize mutex
    pthread_t threads[argc-1]; // declare threads array

強烈不推薦。 首先驗證參數，然后驗證一個專用變量，該變量保存一定數量的線程。 此時，您可以分配一個數組。

    for (i=0;i<argc-2;i++){

       str.filename = argv[1];
       str.word = argv[i+2];

       pthread_create(&threads[i], NULL, count, &str);
    }

與線程類似，將路徑保存在某處。 將其稱為argv 1是不好的風格，這會再次咬住您。 對單詞使用argv可以。

但是，這通常是錯誤的。 您設置了本地結構並將其傳遞給線程，然后立即對其進行更改。 因此，發生在一天結束時，所有線程都在計數相同的單詞。 但是他們所指的單詞在此過程中發生了變化。

    for (i = 0; i < argc-1; ++i)
         pthread_join(threads[i], NULL);

去搞清楚。 您沒有一個擁有一定數量線程的變量，這導致了這種不一致（argc-1 vs argc-2）。

通常，通過正確閱讀編譯器警告，可以在很大程度上解決此問題，並且如果采用基本的良好實踐，通常可以避免此問題。

當然，無論如何都會發生錯誤，在這種情況下，您至少可以縮小范圍。

最后，關於通用方法的幾句話。 目前尚不清楚該練習的目的是什么。 實際上，您必須強迫自己使用除pthread_create和pthread_join之外的任何方法。 假設唯一的要求是使用線程。

我不知道他們是否強迫您多次打開文件。 多次打開和讀取內容不僅浪費，而且在文件被替換並且某些線程打開另一個文件的情況下使您打開。

一個好的解決方案是在主目錄中打開一次文件。 打開后，您可以將文件和fstat映射為大小。 如果由於某種原因不能使用mmap，則應分配足夠大的緩沖區並讀取文件。

然后，所有線程都可以獲得該緩沖區的地址，要查找的單詞以及它們應將計數器存儲到的地址（每個線程獲得一個不同的地址）。

當所有線程退出時，您將循環以求和。

無論哪種方式都不會涉及鎖定。

Answer 3

一次（可能）讀取（最多）10個字符將丟失某些正在搜索的單詞實例。

strcmp（）始終從10個字符的開頭開始。

1）需要在文件中的任何位置查找目標詞。

2）需要在讀入緩沖區的任何位置查找目標單詞。

建議：

0) clear input buffer
1) input one char at a time, 
accumulating characters in the input buffer,
2) when a word separator found, (for instance a space or EOF)
3) then check if the word matches the target word.  
4) if matches, increment count.   
5) if EOF, then exit, else goto 0

程序中的分段錯誤錯誤，用於使用線程不計算文件中單詞的出現次數

問題描述

3 個解決方案

解決方案1
2 2015-05-03 19:29:29

解決方案2
1 已采納 2015-05-03 21:14:29

解決方案3
0 2015-05-03 19:42:53

程序中的分段錯誤錯誤，用於使用線程不計算文件中單詞的出現次數

問題描述

3 個解決方案

解決方案1 2 2015-05-03 19:29:29

解決方案2 1 已采納 2015-05-03 21:14:29

解決方案3 0 2015-05-03 19:42:53

解決方案1
2 2015-05-03 19:29:29

解決方案2
1 已采納 2015-05-03 21:14:29

解決方案3
0 2015-05-03 19:42:53