簡體   English   中英

為什么這個遞歸的pthread_create調用導致數據競爭?

[英]Why does this recursive pthread_create call result in data race?

我以遞歸方式調用pthread_create()時遇到數據競爭。 我不知道遞歸是否會導致問題,但是第一次迭代似乎永遠不會發生競爭,主要是在第二次迭代,很少在第三次迭代。

使用libgc時,存在內存損壞症狀,例如分段錯誤,與數據爭用一致。

以下程序是說明問題的最小示例。 我在示例中沒有使用libgc,因為只有數據競爭才是這個問題的主題。

使用Helgrind工具運行Valgrind時可以看到數據競爭。 報告的問題略有不同,包括有時根本沒有問題。

我正在運行Linux Mint 17.2。 gcc的版本是(Ubuntu 4.8.4-2ubuntu1~14.04)4.8.4。

以下示例'main.c'重現了該問題。 它遍歷鏈表,在單獨的線程中打印每個元素值:

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>


typedef struct List {
  int head ;
  struct List* tail ;
} List ;

// create a list element with an integer head and a tail
List* new_list( int head, List* tail ) {
  List* l = (List*)malloc( sizeof( List ) ) ;
  l->head = head ;
  l->tail = tail ;
  return l ;
}


// create a thread and start it
void call( void* (*start_routine)( void* arg ), void* arg ) {
  pthread_t* thread = (pthread_t*)malloc( sizeof( pthread_t ) ) ;

  if ( pthread_create( thread, NULL, start_routine, arg ) ) {
    exit( -1 ) ;
  }

  pthread_detach( *thread ) ;
  return ;
}


void print_list( List* l ) ;

// start routine for thread
void* print_list_start_routine( void* arg ) {

  // verify that the list is not empty ( = NULL )
  // print its head
  // print the rest of it in a new thread
  if ( arg ) {

    List* l = (List*)arg ;

    printf( "%d\n", l->head ) ;

    print_list( l->tail ) ;

  }

  return NULL ;
}

// print elements of a list with one thread for each element printed
// threads are created recursively
void print_list( List* l ) {
  call( print_list_start_routine, (void*)l ) ;
}


int main( int argc, const char* argv[] ) {

  List* l = new_list( 1, new_list( 2, new_list( 3, NULL ) ) ) ;

  print_list( l ) ;  

  // wait for all threads to finnish
  pthread_exit( NULL ) ;

  return 0 ;
}

這是'makefile':

CC=gcc

a.out: main.o
    $(CC) -pthread main.o

main.o: main.c
    $(CC) -c -g -O0 -std=gnu99 -Wall main.c

clean:
    rm *.o a.out

這是Helgrind最常見的輸出。 請注意,只有一個數字,1,2和3的行是程序的輸出而不是Helgrind:

$ valgrind --tool=helgrind ./a.out 
==13438== Helgrind, a thread error detector
==13438== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==13438== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==13438== Command: ./a.out
==13438== 
1
2
==13438== ---Thread-Announcement------------------------------------------
==13438== 
==13438== Thread #3 was created
==13438==    at 0x515543E: clone (clone.S:74)
==13438==    by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
==13438==    by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==13438==    by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4007EB: call (main.c:25)
==13438==    by 0x400871: print_list (main.c:58)
==13438==    by 0x40084D: print_list_start_routine (main.c:48)
==13438==    by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4E45181: start_thread (pthread_create.c:312)
==13438==    by 0x515547C: clone (clone.S:111)
==13438== 
==13438== ---Thread-Announcement------------------------------------------
==13438== 
==13438== Thread #2 was created
==13438==    at 0x515543E: clone (clone.S:74)
==13438==    by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
==13438==    by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==13438==    by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4007EB: call (main.c:25)
==13438==    by 0x400871: print_list (main.c:58)
==13438==    by 0x4008BB: main (main.c:66)
==13438== 
==13438== ----------------------------------------------------------------
==13438== 
==13438== Possible data race during write of size 1 at 0x602065F by thread #3
==13438== Locks held: none
==13438==    at 0x4C368F5: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4012CD6: _dl_allocate_tls_init (dl-tls.c:436)
==13438==    by 0x4E45715: pthread_create@@GLIBC_2.2.5 (allocatestack.c:252)
==13438==    by 0x4C30C90: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4007EB: call (main.c:25)
==13438==    by 0x400871: print_list (main.c:58)
==13438==    by 0x40084D: print_list_start_routine (main.c:48)
==13438==    by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==13438==    by 0x4E45181: start_thread (pthread_create.c:312)
==13438==    by 0x515547C: clone (clone.S:111)
==13438== 
==13438== This conflicts with a previous read of size 1 by thread #2
==13438== Locks held: none
==13438==    at 0x51C10B1: res_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==13438==    by 0x51C1061: __libc_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==13438==    by 0x4E45199: start_thread (pthread_create.c:329)
==13438==    by 0x515547C: clone (clone.S:111)
==13438== 
3
==13438== 
==13438== For counts of detected and suppressed errors, rerun with: -v
==13438== Use --history-level=approx or =none to gain increased speed, at
==13438== the cost of reduced accuracy of conflicting-access information
==13438== ERROR SUMMARY: 8 errors from 1 contexts (suppressed: 56 from 48)

正如Pooja Nilangekar所提到的,用pthread_join()替換pthread_detach()會消除競爭。 但是,分離線程是必需的,因此目標是干凈地分離線程。 換句話說,在刪除競賽時保留pthread_detach()。

線程之間似乎有一些無意的共享。 意外共享可能與此處討論的內容有關: http//www.domaigne.com/blog/computing/joinable-and-detached-threads/特別是示例中的錯誤。

我仍然不明白究竟發生了什么。

替換行pthread_detach( *thread ) ; 使用pthread_join(*thread,NULL); 這將確保子進程在父進程之前終止,因此沒有seg錯誤。

helgrind的輸出與您的源不匹配。 根據helgrind,在第25行有一個pthread_create調用,但我看到的只是exit(-1) 我假設您忘記在源的開頭添加一行。

話雖這么說,我根本無法重現helgrind的輸出。 我已經在一個while循環中運行你的程序,希望得到相同的錯誤,但是nada。 這對於比賽來說是件令人討厭的事情 - 你永遠不知道它們何時發生,而且它們很難追蹤。

然后還有另外一件事:每當釋放解析器狀態信息(DNS)時,都會調用res_thread_freeres 實際上,它甚至沒有被檢查就被調用了。 _dl_allocate_tls_init用於線程本地存儲(TLS),並確保在您的函數控制線程之前分配/存儲某些資源和元數據(自定義堆棧,清理信息等)。

這表明在創建新線程和殺死舊線程之間存在競爭。 由於您分離了線程,父線程可能會在子項完成之前死亡。 在這種情況下,同步線程的退出(Pooja Nilangekar指出可以通過加入它們來完成)可能會解決問題,因為pthread_join停止直到線程結束,從而同步子/父解除配置。

如果你仍然想要並行,你可以做的是你自己照顧記憶。 具體請參見pthread_attr_setstack 由於我無法重現錯誤,我還沒有確定這是否真的有效。 此外,這種方法要求您知道您將擁有的線程數量。 如果您嘗試重新分配線程當前使用的內存,那么您正在玩火。

只是一個注釋(我沒有回復評論),我得到了非常類似的helgrind輸出而沒有遞歸。 我使用lambda生成一個線程並將其分離。

==9060== Possible data race during write of size 1 at 0x126CE63F by thread #1
==9060== Locks held: none
==9060==    at 0x4C36D85: mempcpy (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==9060==    by 0x4012D66: _dl_allocate_tls_init (dl-tls.c:436)
==9060==    by 0x6B04715: get_cached_stack (allocatestack.c:252)
==9060==    by 0x6B04715: allocate_stack (allocatestack.c:501)
==9060==    by 0x6B04715: pthread_create@@GLIBC_2.2.5 (pthread_create.c:500)
==9060==    by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==9060==    by 0x6359D23: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==9060==    by 0x404075: thread<main()::<lambda()> > (thread:138)
==9060==    by 0x404075: main (test1.cpp:162)
==9060== 
==9060== This conflicts with a previous read of size 8 by thread #2
==9060== Locks held: none
==9060==    at 0x6E83931: res_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==9060==    by 0x6E838E1: __libc_thread_freeres (in /lib/x86_64-linux-gnu/libc-2.19.so)
==9060==    by 0x6B0419B: start_thread (pthread_create.c:329)
==9060==    by 0x6E1803C: clone (clone.S:111)
==9060==  Address 0x126ce63f is not stack'd, malloc'd or on a free list

但是我在循環中這樣做,我只報告了一次。 這表明TLS機制中可能存在觸發警報的可能性。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM