简体   繁体   中英

Understanding concurrent file writes from multiple processes

From here : Is file append atomic in UNIX

Consider a case where multiple processes open the same file and append to it. O_APPEND guarantees that seeking to the end of file and then beginning the write operation is atomic. So multiple processes can append to the same file and no process will overwrite any other processes' write as far as each write size is <= PIPE_BUF.

I wrote a test program where multiple processes open and write to the same file ( write(2) ). I make sure each write size is > PIPE_BUF (4k). I was expecting to see instances where a process overwrites someone else's data. But that doesnt happen. I tested with different write sizes. Is that just luck or is there a reason why that doesn't happen? My ultimate goal is to understand if multiple processes appending to the same file need to co-ordinate their writes.

Here is the complete program. Every process creates an int buffer, fills all values with its rank , opens a file and writes to it.

Specs: OpenMPI 1.4.3 on Opensuse 11.3 64-bit

Compiled as: mpicc -O3 test.c, run as: mpirun -np 8 ./a.out

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int 
main(int argc, char** argv) {
    int rank, size, i, bufsize = 134217728, fd, status = 0, bytes_written, tmp_bytes_written;
    int* buf;
    char* filename = "/tmp/testfile.out";

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    buf = (int*) malloc (bufsize * sizeof(int));   
    if(buf == NULL) {
        status = -1;
        perror("Could not malloc");
        goto finalize;
    }
    for(i=0; i<bufsize; i++) 
        buf[i] = rank;

    if(-1 == (fd = open(filename, O_APPEND|O_WRONLY, S_IWUSR))) {
        perror("Cant open file");
        status = -1;
        goto end;
        exit(-1);
    }

    bytes_written = 0;
    if(bufsize != (tmp_bytes_written = write(fd, buf, bufsize))) {
        perror("Error during write");
        printf("ret value: %d\n", tmp_bytes_written);
        status = -1;
        goto close;
    }

close:
    if(-1 == close(fd)) {
        perror("Error during close");
        status = -1;
    }
end:
    free(buf);
finalize:
    MPI_Finalize();
    return status;
}

Atomicity of writes less than PIPE_BUF applies only to pipes and FIFOs. For file writes, POSIX says:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.

...which means that you're on your own - different UNIX-likes will give different guarantees.

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers, and that is by any amount, not just PIPE_BUF. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.

This says nothing about whether reads will ever see a torn write though, and on that POSIX says the following about atomicity of read() and write() to regular files:

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]

and

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]

but conversely:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]

A safe interpretation of all three of these requirements would suggest that all writes overlapping an extent in the same file must be serialised with respect to one another and to reads such that torn writes never appear to readers.

A less safe, but still allowed interpretation could be that reads and writes only serialise with each other between threads inside the same process, and between processes writes are serialised with respect to reads only (ie there is sequentially consistent i/o ordering between threads in a process, but between processes i/o is only acquire-release).

Of course, just because the standard requires these semantics doesn't mean implementations comply, though in fact FreeBSD with ZFS behaves perfectly, very recent Windows (10.0.14393) with NTFS behaves perfectly, and recent Linuxes with ext4 behaves correctly if O_DIRECT is on. If you would like more detail on how well major OS and filing systems comply with the standard, see this answer

It's not luck, in the sense that if you dig into the kernel you can probably prove that in your particular circumstances it will never happen that one processes' write is interleaved with another one. I am assuming that:

  • You are not hitting any file size limits
  • You are not filling the filesystem in which you create the test file
  • The file is a regular file (not a socket, pipe, or something else)
  • The filesystem is local
  • The buffer does not span multiple virtual memory mappings (this one is known to be true, because it's malloc() ed, which puts it on the heap, which it contiguous.
  • The processes aren't interrupted, signaled, or traced while write() is busy.
  • There are no disk I/O errors, RAM failures, or any other abnormal conditions.
  • (Maybe others)

You will probably indeed find that if all those assumptions hold true, it is the case that the kernel of the operating system you happen to be using always accomplishes a single write() system call with a single atomic contiguous write to the following file.

That doesn't mean you can count on this always being true. You never know when it might not be true when:

  • the program is run on a different operating system
  • the file moves to an NFS filesystem
  • the process gets a signal while the write() is in progress and the write() returns a partial result (fewer bytes than requested). Not sure if POSIX really allows this to happen but I program defensively!
  • etc...

So your experiment can't prove that you can count on non-interleaved writes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM