Why is multithreading slower?

Question

A multi-process and multi-threaded implementation of three linked lists with 1000000 nodes using merge sort was implemented. I compared the real-time of the implemented program, but the multi-thread method is slower. Why is that?

main method in process.c

    /* Insert nodes */
    Node* tmp = NULL;   
    int num;    
    for( int i = 0; i < MAX; i++ )
    {
        fscanf(fread,"%d",&num);    
        tmp = createNode(num , i ); 
        insertNode( &list1.head, &list1.tail, tmp );
        tmp = createNode(num , i ); 
        insertNode( &list2.head, &list2.tail, tmp );    
        tmp = createNode(num , i );
        insertNode( &list3.head, &list3.tail, tmp );    
        tmp = createNode(num , i ); 
    }
    free( tmp );    
    fclose(fread);  

    if ((t1 = times(&mytms)) == -1) {
        perror("times 1");
        exit(1);
    }

    pid1= fork();   
    if(pid1==0){
        mergeSort( &list1.head );   
        file_output(&list1);    
        freeAll( list1.head );
        exit(1);    
    }
    pid2= fork();   
    if(pid2==0){
        mergeSort( &list2.head );   
        file_output(&list2);    
        freeAll( list2.head );  
        exit(2);    
    }
    pid3 = fork();
    if(pid3==0){
        mergeSort( &list3.head );   
        file_output(&list3);    
        freeAll( list3.head );  
        exit(3);    
    }

    wait(&status);  
    wait(&status);
    wait(&status);

    if ((t2 = times(&mytms)) == -1) {   
        perror("times 2");
        exit(1);
    }

    printf("Real time : %.5f sec\n", (double)(t2 - t1) / CLK_TCK);
    printf("User time : %.5f sec\n", (double)mytms.tms_utime / CLK_TCK);
    printf("System time : %.5f sec\n", (double)mytms.tms_stime / CLK_TCK);

Result real-time: 1.65

main in thread.c

   /* Insert nodes */
   Node* tmp = NULL;   
   int num;           

   for( int i = 0; i < MAX; i++ )
   {
      fscanf(fread,"%d",&num); 
      tmp = createNode(num , i ); 
      insertNode( &list1.head, &list1.tail, tmp );  
      tmp = createNode(num , i );  
      insertNode( &list2.head, &list2.tail, tmp );  
      tmp = createNode(num , i );  
      insertNode( &list3.head, &list3.tail, tmp );  
   }

   free( tmp );
   fclose(fread);  

   if ((t1 = times(&mytms)) == -1) {
        perror("times 1");
        exit(1);
   }

   pthread_create( &t_id1, NULL, thread_func, &list1 );
   pthread_create( &t_id2, NULL, thread_func, &list2 );
   pthread_create( &t_id3, NULL, thread_func, &list3 );

   pthread_join( t_id1, (void*)&status );
   pthread_join( t_id2, (void*)&status );
   pthread_join( t_id3, (void*)&status );

   if ((t2 = times(&mytms)) == -1) {
        perror("times 2");
      exit(1);
   }

   printf("Real time : %.5f sec\n", (double)(t2 - t1) / CLK_TCK);
   printf("User time : %.5f sec\n", (double)mytms.tms_utime / CLK_TCK);  
   printf("System time : %.5f sec\n", (double)mytms.tms_stime / CLK_TCK);

result real-time 2.27

Answer 1

Why is multithreading slower?

It is processor specific and tied to the number of cores , the organization of CPU caches , their cache coherence , your RAM . See also tests and benchmarks on https://www.phoronix.com/ ; it won't be the same on Intel Core i7 10700K and on AMD Ryzen 9 3900X (whose price are close).

It is also both compiler and optimization specific . Read the Dragon book and a good book on Computer Architecture .

It also depends upon your particular operating system and your particular C standard library (eg GNU glibc is not the same as musl-libc ), and glibc 2.31 could have different performance than glibc 2.30 on the same computer. Read Advanced Linux Programming , pthreads(7) , nptl(7) , numa(7) , time(7) , madvise(2) , syscalls(2)

Did you try on a recent Linux with a recent GCC 10 invoked as gcc -Wall -O3 -mtune=native at least?

You could use proc(5) then hwinfo on Linux to query your hardware.

You might be interested in OpenCL , OpenMP , or OpenACC , and you should read about optimization options of your particular C compiler. For recent GCC , see this . You could even customize your recent GCC with your GCC plugins to improve optimizations, and you could try a recent Clang or icc compiler.

See also the MILEPOST GCC project and the CTuning one . Read also this draft report. Attend ACM SIGPLAN and SIGOPS conferences. Contact computer science academics near you.

^{you probably could get a PhD while understanding the answer to your question.}

Why is multithreading slower?

Question

1 answers

solution1
-1 ACCPTED 2020-06-10 16:45:44