I am writing output from a simulation to a file using the following code
sprintf(filename, "time_data.dat");
FILE *fp = fopen(filename,"w");
for(i=0;i<ntime;i++){
compute_data();
fprintf(fp, "%d %lf %lf \n", step, time_val ,rho_rms);
}
return;
On my desktop, I get to see the file time_data.dat update every few hours (compute_data() takes a few hundred seconds per time step, with OpenMP on an i7 machine). I have now submitted the job to a cluster node (E5 2650 processor running ubuntu server). I have been waiting for 5 days now, and not a line line has appeared in the file yet. I do
tail -f time_data.dat
to check the output. The simulation will take another couple of weeks to complete. I can't wait for that long to see if my output is good. Is there a way I can probe the OS in the node to flush its buffer without disturbing the computation? If I cancel the job now, I am sure there won't be any output. Please note that the hard disk to which the output file is being written is one shared using NFS over multiple nodes and the master node. Is this causing any trouble? Is there a temporary file place were the output is actually being written?
PS: I did du -h to find the file showing size 0. I also tried ls -l proc/$ID/fd to confirm the file did open.
You might use lsof
or simply ls -l /proc/$(pidof yoursimulation)/fd
to check (on the cluster node) that indeed time_data.dat
has been opened.
For such long-running programs, I would believe it is worthwhile to consider using:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.