MPI Segmentation Fault (11) And Address not mapped (1) when running large computations

Question

I am running large computations that involve a lot of nested loops and I am using the recv and send features to parallelize some function calculations. MPI will only act on these functions. Executions (which execute all program configurations in the job loop) are automated to be executed only one at a time and do not use MPI. Also, each configuration N is executed twice, v1_JE and v2_JE, each for loop indicated by single traced line.

The problem occurs when, around 60% of the total program execution, MPI signals Segmentation Fault (11) And Address not mapped (1) and aborts program execution.

The configurations I am using in my computer is:
• OS: MacOs High-Sierra 10.13.6
• Processor and Memory: Intel® Core™ 2 Duo P8600; 2x2Gb sticks 1666Mhz Intel® Core™ 2 Duo P8600; 2x2Gb sticks 1666Mhz
• Compiler: Compiled with makefile using: mpicc -std=c99 -w -Wall -I$(INC_DIR) -c lgvpolymer20.c and OpenMPI 4.1.2
• MPI Version: 3.1

Below is the error I get from the terminal and the code I am using that results in this error:

The error I get from the Terminal (see also the img here):

[MacBook-de-Jailson:48014] *** Process received signal ***
[MacBook-de-Jailson:48014] Signal: Segmentation fault: 11 (11)
[MacBook-de-Jailson:48014] Signal code: Address not mapped (1)
[MacBook-de-Jailson:48014] Failing at address: 0x68\
[MacBook-de-Jailson:48014] [ 0] 0   libsystem_platform.dylib 0x00007fff5fca2f5a _sigtramp + 26
[MacBook-de-Jailson:48014] [ 1] 0   ???                                 0x0000000000000000 0x0 + 0
[MacBook-de-Jailson:48014] [ 2] 0   libsystem_c.dylib                   0x00007fff5fa27728 vfprintf_l + 28
[MacBook-de-Jailson:48014] [ 3] 0   libsystem_c.dylib                   0x00007fff5fa203c9 fprintf + 176
[MacBook-de-Jailson:48014] [ 4] 0   lgvpolymer20                        0x00000001083c8bb7 main + 5607
[MacBook-de-Jailson:48014] [ 5] 0   libdyld.dylib                       0x00007fff5f994015 start + 1
[MacBook-de-Jailson:48014] [ 6] 0   ???                                 0x0000000000000001 0x0 + 1
[MacBook-de-Jailson:48014] *** End of error message ***\
Segmentation fault: 11
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
                   
[Processo concluído]

The Code I am using:

int main(int argc, char *argv[]){

    int job, i, n, s, print_step;
    double v, vJE= 0.0, dH_dLambda, Energy, delta_t, barsize = 50, max = steps, t = 0.0;
    int percent = (s / tmax * 100), chras = (s * barsize / max);
    double Rg, Cm, Work, POLYEXP, POLYDM, POLYSIZE,INTERAC;
    static double *Fx, *Fy, *Fz, *vx, *vy, *vz, *x, *y, *z;
    static char *filename_1 = NULL, *filename_2 = NULL;
    int MASTER_THREAD_INIT, MASTER_THREAD_FINAL = 0;
    char maquina[MPI_MAX_PROCESSOR_NAME];
    int versao, subversao,  aux, ret;
    double Avg_Velocity, Avg_Energy;
    double t_inicial, t_final;
    float timestep = 0.0;
    int rank, n_procs;
    int tag = 1;
    
    Fx = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    Fy = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    Fz = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    vx = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    vy = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    vz = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    x = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    y = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    z = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));

    if(filename_1 == NULL && filename_2 == NULL){
        filename_1 = (char *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(char) + 1));
        filename_2 = (char *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(char) + 1));
    }

    /* G e n e r a t i n g  T h e  I n i t i a l s  C o n d i t i o n s  */
    if (i == 0){
        for(i=0, s=1; i<Npol; i++){
            x[i] = y[i] = vx[i] = vy[i] = vz[i] = 0.0;
            z[i] = R*(double)( (-((Npol-1)/2+m_trans)) + i);
            if (i == (Npol-1)/2+m_trans){ x[(Npol-1)/2+m_trans] = y[(Npol-1)/2+m_trans] = z[(Npol-1)/2+m_trans] = R*0.0;}
            if (i == (Npol-1)/2+2+(m_trans)){ x[(Npol-1)/2+2+(m_trans)] = y[(Npol-1)/2+2+(m_trans)] = R*1.0; z[(Npol-1)/2+2+(m_trans)] = R*2.0;}
            if (i == (Npol-1)/2-2+(m_trans)){ x[(Npol-1)/2-2+(m_trans)] = y[(Npol-1)/2-2+(m_trans)] = -R*1.0; z[(Npol-1)/2-2+(m_trans)] = -R*2.0;}
        }
    }

    MPI_Init(&argc,&argv);
    MPI_Get_version(&versao,&subversao);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(maquina, &aux);
    MPI_Comm_size(MPI_COMM_WORLD, &n_procs);
    t_inicial = MPI_Wtime();

    FILE *output1, *output2;

    /*---------------------------------------------------------------------------------------------------
        --B e l l o w  T h e  M a i n  L o o p  ( O v e r  T h e  C o n f i g u r a t i o n s  (N)  )    
        T h a t  P r o d u c e s  a  L o t  O f  C o n f i g u r a t i o n s  O u t p u t s.
    -----------------------------------------------------------------------------------------------------*/
    if (rank == 0){
        for (job=0; job<N_JOBS; job++){
            for (int q=0; q<=1; q++){
                if(q == 0){ vJE = v1_JE;}
                if(q == 1){ vJE = v2_JE;}
            }
            int Job; Job += 1;
            SEED++;
            count_exec++;
            Done=count_exec;
            Pending=(N_JOBS - count_exec);
            printf("\n");
            printf(" ");
            printf("\e[1;7m\x1B[4mTOTAL JOBS\x1B[0m\e[1;7m:%d  -  \x1B[4mALIVE\x1B[0m\e[1;7m:%d  -  \x1B[4mPENDING\x1B[0m\e[1;7m:%d  -  \x1B[4mDONE\x1B[0m\e[1;7m:%d \n", N_JOBS, n_procs, (Pending+1), (Done-1));
            printf("\x1B[0m\n");
            /*-------------------------------------------------------------------------------*/
            // -- Executing the loop for the potential U(r,t) with velocity v1_JE --
            if(v1_JE){
                printf("  ");
                printf("\e[100m\x1B[4mRUNNING CONFIGURATION(N%d)\x1B[0m\e[100m:  \x1B[4mNpol\x1B[0m\e[100m:%d - \x1B[4mm_trans\x1B[0m\e[100m:%d - \x1B[4mSEED\x1B[0m\e[100m:%d - \x1B[4mv_JE\x1B[0m\e[100m:%1.4lf\x1B[0m \n", (job+1), Npol, m_trans, SEED, v1_JE);
                printf("\x1B[0m\n");
                sprintf(filename_1,"/Users/jailsonoliveira/Desktop/poly_data_v1_N%i.txt",Job);  /* -- Automating the string name that contains the adress to write the output data -- */
                sprintf(filename_2,"/Users/jailsonoliveira/Desktop/dinamicamolecular/Particle_Langevin_Dynamic/historico_codigos/200320212352lgvpolymer20/CodigoSemSaidaGrafica/runsN50polymer/RunN25v0p005/work_v1_N%i.txt",Job);
                output1 = fopen(filename_1,"w");
                output2 = fopen(filename_2,"w");
               
                /* -- Main loop over the timesteps using v1_JE and a different seed value for each value of each s -- */
                for(s=1, delta_t=0.0, t=0.0; s<=steps; s++, t+=dt, delta_t+=dt){
                    timestep += 10;
                    print_step += 1;
                    RK4(t, Fx, Fy, Fz, x, y, z, vx, vy, vz, &v, Npol);
                    Energy = ENERGIES(t, x, y, z, vx, vy, vz, &v, Npol);
                    POLYDM = PolymerDownMembrane(t,z,&v,Npol);
                    POLYEXP = PolymerExponent(x,y,z,Npol);
                    POLYSIZE = PolymerSize(x,y,z,Npol);
                    Work = WORK(s,&v,x,y,z,Npol);
                    Rg = RG(x,y,z,Npol);
                    Cm = CM(x,y,z,Npol);
                    Avg_Velocity = AvgVelocity(t,x,y,z,Npol);
                    Avg_Energy = AvgEnergy(t,x,y,z,vx,vy,vz,&v,Npol);
                    INTERAC = testinterac(x,y,z,Npol);
                    fprintf(output1, "%5.4lf %10.10lf %10.10lf %10.10lf %10.10lf %.0f %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf\n", t, Rg, Cm, Work, POLYEXP, POLYDM, POLYSIZE, Energy, *Fx, *Fy, *Fz, INTERAC);//, MTEST);
                    fprintf(output2, "%10.18lf\n", Work);
                    prog_bar("\033[1;39m Executando: ", print_step, steps);
                    if ( ((z[s]-z[s-1])-v1_JE/flory_exp) <= ((0.9523809524*R*v1_JE)/flory_exp) && s >= relaxation){v = v1_JE;} // 0.9525 is to make the inequality between true and false change more often.
                    else{v = 0.0;}
                    if (s > warmup){updowncontrol = true;}
                    else{updowncontrol = false;}
                }
                
                /* -- Printing average values associated with execution using v1_JE -- */
                if (rank == 0){
                    t_final = MPI_Wtime();
                    printf("\n\n");
                    printf(" ");
                    printf("\e[38;5;27m U(r,t) Velocity:           %1.4lf\n",v1_JE);
                    printf(" ");
                    printf("\e[38;5;27m Total Energy Average:      %3.5f u.e.\n",Avg_Energy);
                    printf(" ");
                    printf("\e[38;5;27m Velocity Average Value:    %3.5f u.v.\n",Avg_Velocity);
                    printf(" ");
                    printf("\e[38;5;27m MPI Version:               %d.%d \n", versao, subversao);
                    printf(" ");
                    printf("\e[38;5;27m Number of Tasks:           %d\n", n_procs);
                    printf(" ");
                    printf("\e[38;5;27m Rank:                      %d\n", rank);
                    printf(" ");
                    printf("\e[38;5;27m Executing on the Machine:  %s\n", maquina);
                    printf("\n");
                    printf(" ");
                    printf("\e[38;5;118m Task Finished in %3.5f seconds\n",t_final-t_inicial);
                    printf("\033[0;0m\n");
                } 
                print_step=0;
                printf("\n");
            }
           /*-------------------------------------------------------------------------------*/
            /*  -- Executing the loop for the potential U(r,t) with velocity v2_JE --  */
            if(v2_JE){
                printf("  ");
                printf("\e[100m\x1B[4mRUNNING CONFIGURATION(N%d)\x1B[0m\e[100m:  \x1B[4mNpol\x1B[0m\e[100m:%d - \x1B[4mm_trans\x1B[0m\e[100m:%d - \x1B[4mSEED\x1B[0m\e[100m:%d - \x1B[4mv_JE\x1B[0m\e[100m:%1.4lf\x1B[0m \n", (job+1), Npol, m_trans, SEED, v2_JE);
                printf("\x1B[0m\n");
                sprintf(filename_1,"/Users/jailsonoliveira/Desktop/poly_data_v2_N%i.txt",Job);
                sprintf(filename_2,"/Users/jailsonoliveira/Desktop/dinamicamolecular/Particle_Langevin_Dynamic/historico_codigos/200320212352lgvpolymer20/CodigoSemSaidaGrafica/runsN50polymer/RunN25v0p005/work_v2_N%i.txt",Job);
                output1 = fopen(filename_1,"w");
                output2 = fopen(filename_2,"w");

                /* -- Main loop over the timesteps using v2_JE and a different seed value for each value of each s -- */
                for(s=1, delta_t=0.0, t=0.0; s<=steps; s++, t+=dt, delta_t+=dt){
                    timestep += 10;
                    print_step += 1;
                    RK4(t, Fx, Fy, Fz, x, y, z, vx, vy, vz, &v, Npol);
                    Energy = ENERGIES(t, x, y, z, vx, vy, vz, &v, Npol);
                    POLYDM = PolymerDownMembrane(t,z,&v,Npol);
                    POLYEXP = PolymerExponent(x,y,z,Npol);
                    POLYSIZE = PolymerSize(x,y,z,Npol);
                    Work = WORK(s,&v,x,y,z,Npol);
                    Rg = RG(x,y,z,Npol);
                    Cm = CM(x,y,z,Npol);
                    Avg_Velocity = AvgVelocity(t,x,y,z,Npol);
                    Avg_Energy = AvgEnergy(t,x,y,z,vx,vy,vz,&v,Npol);
                    INTERAC = testinterac(x,y,z,Npol);
                    fprintf(output1, "%5.4lf %10.10lf %10.10lf %10.10lf %10.10lf %.0f %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf\n", t, Rg, Cm, Work, POLYEXP, POLYDM, POLYSIZE, Energy, *Fx, *Fy, *Fz, INTERAC);//, MTEST);
                    fprintf(output2, "%10.18lf\n", Work);
                    prog_bar("\033[1;39m Executando: ", print_step, steps);
                    if ( ((z[s]-z[s-1])-v2_JE/flory_exp) <= ((0.9523809524*R*v2_JE)/flory_exp) && s >= relaxation){v = v2_JE;}
                    else{v = 0.0;}
                    if (s > warmup){updowncontrol = true;}
                    else{updowncontrol = false;}
                }
            
                // Print for the configuration using v2_JE:
                if (rank == 0){
                    t_final = MPI_Wtime();
                    printf("\n\n");
                    printf(" ");
                    printf("\e[38;5;27m U(r,t) Velocity:           %1.4lf\n",v2_JE);
                    printf(" ");
                    printf("\e[38;5;27m Total Energy Average:      %3.5f u.e.\n",Avg_Energy);
                    printf(" ");
                    printf("\e[38;5;27m Velocity Average Value:    %3.5f u.v.\n",Avg_Velocity);
                    printf(" ");
                    printf("\e[38;5;27m MPI Version:               %d.%d \n", versao, subversao);
                    printf(" ");
                    printf("\e[38;5;27m Number of Tasks:           %d\n", n_procs);
                    printf(" ");
                    printf("\e[38;5;27m Rank:                      %d\n", rank);
                    printf(" ");
                    printf("\e[38;5;27m Executing on the Machine:  %s\n", maquina);
                    printf("\n");
                    printf(" ");
                    printf("\e[38;5;118m Task Finished in %3.5f seconds\n",t_final-t_inicial);
                    printf("\033[0;0m\n\n\n");
                    printf("\033[1;39m =============================================================================");
                    printf("\033[0;0m\n");
                }
                print_step=0;
                printf("\n");
            }
        }

        /*-------------------------------------------------------------------------------------------------------------------------
            --R e t r i e v i n g  F r o m  P o o l  T h e  I n d e p e n d e n t l y  C o m p u t e d  P a r t i a l  V a l u e s.
        ---------------------------------------------------------------------------------------------------------------------------*/
        if (rank == 0){
            for (MASTER_THREAD_INIT = 1; MASTER_THREAD_INIT < n_procs; MASTER_THREAD_INIT++){ 
                MPI_Recv(&RK4, 1, MPI_DOUBLE, MASTER_THREAD_INIT, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
                MPI_Recv(&FORCES, 1, MPI_DOUBLE, MASTER_THREAD_INIT, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            }
        }

        /*--------------------------------------------------------------------------------------------------------
            --T h e  V a l u e s  T h a t  W e r e  R e t r i e v e d  A b o v e  A r e  S e n t  T o    
            M P I  M a s t e r - P r o c  A n d  A r e  C o m p l e t t e l y  R e t r i e v e d. ( J O I N )    
        ----------------------------------------------------------------------------------------------------------*/
        else{
            t_final = MPI_Wtime();
            MPI_Send(&RK4, 1, MPI_DOUBLE, MASTER_THREAD_FINAL, tag, MPI_COMM_WORLD);
            MPI_Send(&FORCES, 1, MPI_DOUBLE, MASTER_THREAD_FINAL, tag, MPI_COMM_WORLD);
        }
        fclose(output1);
        fclose(output2);
    }

    MPI_Finalize();

    free(Fx);
    free(Fy);
    free(Fz);
    free(vx);
    free(vy);
    free(vz);
    free(x);
    free(y);
    free(z);
    
    printf("\n\n");
    printf("Press \x1B[4mENTER\x1B[0m to close window...");
    getchar();
    #ifdef __APPLE__
    system("osascript -e 'tell application\"Terminal\" to close windows 0\n'");
    #elif _WIN32 | __linux__

    return 0;

    #endif
}

I've tried to allocate the more global quantities(arrays) automatically using, eg:\

static double *Fx = NULL; *Fy = NULL; *Fz = NULL;
if (Fx == NULL && Fy == NULL && Fz == NULL){
    Fx = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    Fy = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
    Fz = (double *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(double)));
}

It seems the problem is not in form of the allocation. I am thinking of the recv and send MPI features or the if(rank == 0) conditional. Maybe this error is associated with these features, or the region where they are loaded.\

Thanks a lot in advanced,
Jailson!

Answer 1

Is it really a problem caused by MPI?

If you look closely at the stack trace that the MPI runtime so gracefully produces:

...
[MacBook-de-Jailson:48014] [ 3] 0 libsystem_c.dylib  0x00007fff5fa203c9 fprintf + 176 <<<<
[MacBook-de-Jailson:48014] [ 4] 0 lgvpolymer20       0x00000001083c8bb7 main + 5607
...

The root of the segfault is a call to fprintf . There are four such calls in your code and they are in two identical groups of:

fprintf(output1, "%5.4lf %10.10lf %10.10lf %10.10lf %10.10lf %.0f %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf %10.10lf\n", t, Rg, Cm, Work, POLYEXP, POLYDM, POLYSIZE, Energy, *Fx, *Fy, *Fz, INTERAC);//, MTEST);
fprintf(output2, "%10.18lf\n", Work);

All values being passed are double s, except Fx , Fy , and Fz which are pointers, but all three of them are being dereferenced. If any of Fx , Fy or Fz was null, the segfault would originate in main itself. On the other hand, fprintf in macOS comes from FreeBSD and is just a wrapper around vfprintf_l which is a wrapper around __xvprintf . Before calling __xvprintf , vfprintf_l locks the file stream and then unlocks it after the call. Since __xvprintf is not in the stack trace, the problem must be in locking the file stream. The lock is a member of the FILE structure. There are two possible reasons for locking to produce a segfault:

The lock object may be broken. There is a good reason in your code for that to happen. Notice how you allocate memory for filename_1 and filename_2 :

if(filename_1 == NULL && filename_2 == NULL){
    filename_1 = (char *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(char) + 1));
    filename_2 = (char *) malloc((size_t) (N_CONFIGURATIONS*steps*sizeof(char) + 1));
}

So each of these can hold N_CONFIGURATIONS * steps + 1 bytes. Further down the code, you do:

sprintf(filename_2,"/Users/jailsonoliveira/Desktop/dinamicamolecular/Particle_Langevin_Dynamic/historico_codigos/200320212352lgvpolymer20/CodigoSemSaidaGrafica/runsN50polymer/RunN25v0p005/work_v1_N%i.txt",Job);

That string is 182 bytes + whatever the number of digits in the decimal representation of Job long. So if N_CONFIGURATIONS * steps + 1 is less than 182, you are going to overwrite memory located outside the one allocated for filename_2 , possibly destroying data further down in memory. If the lock object allocated by fopen happens to be there, it will be overwritten. But the filename has many ASCII characters and your segfault occurs at 0x0000000000000000 , so that's probably not the case.

Another possibility is simply that the FILE* pointer is NULL . Since you are not checking the return value of any of the fopen calls, it is perfectly possible that some of them could return NULL if it cannot create the desired file.

output1 = fopen(filename_1,"w");
output2 = fopen(filename_2,"w");

Why would fopen fail to create a file for writing? One possible reason is that the directory does not exist. But then it will fail on the first job already. The hypothesis is: you are likely hitting the limit on open files .

Check this out:

if (rank == 0){
    for (job=0; job<N_JOBS; job++){
        // ...
        if(v1_JE){
            // ...
            output1 = fopen(filename_1,"w");
            output2 = fopen(filename_2,"w");
            // ...
        }
        if(v2_JE){
            // ...
            output1 = fopen(filename_1,"w");
            output2 = fopen(filename_2,"w");
            // ...
        }
    }
    fclose(output1);
    fclose(output2);
}

This is the outline of your code where everything except the file open/close operations has been removed. Obviously, you are opening files that you are never closing. You are only closing the files opened by the second part of the last task. Let's check the open file limit on a modern version of macOS:

$ launchctl limit maxfiles
    maxfiles    256            unlimited

The soft limit is 256 open files. On your screenshot, it reads "DONE:59", so you must have at least 59 x 4 = 236 open files. Add two files from the first part of the current task and you end up with 238 open files. Add a couple of network sockets and you are already at the limit of 256 files. Trying to open even more files in the second part results in hitting the limit and hence fopen returns NULL . You do not check the return value and simply call fprintf with that NULL , hence the segfault.

So, no, the problem is not in MPI. The problem is that you must close all open files at the right time and place and also always check the return value of fopen .

Some people are mad at Apple that they put such a low default limit on the number of open files. But this is a perfect illustration of why it's actually a good thing and how it helps expose coding errors that leave many files open.

MPI Segmentation Fault (11) And Address not mapped (1) when running large computations

Question

1 answers

solution1
0 ACCPTED 2022-06-09 19:24:01

MPI Segmentation Fault (11) And Address not mapped (1) when running large computations

Question

1 answers

solution1 0 ACCPTED 2022-06-09 19:24:01

solution1
0 ACCPTED 2022-06-09 19:24:01