My filesystem (FS) (ZFS specifically) supports copy-on-write (COW), ie a copy (if done right) is a very cheap constant operation, and does not actually copy the underlying content. The content is copied only once I write/modify the new file.
Actually, I just found out, ZFS-on-Linux actually has not implemented that for userspace yet (right?). But eg BTRFS or XFS has. (See here , here , here , here .)
For the (GNU) cp
utility, you would pass --reflink=always
option (see here .) cp
calls ioctl (dest_fd, FICLONE, src_fd)
(see here , here ).
How would I get this behavior (if possible) in Python?
I assume that "zero-copy" (eg here via os.sendfile
) would not result in such behavior, right? Because looking at shutil
s _fastcopy_sendfile
implementation ( here ), it is still a loop around os.sendfile
using some custom byte count (supposed to be the block size, max(os.fstat(infd).st_size, 2 ** 23)
). Or would it?
The COW, is this on a file level, or block level?
If possible, I want this to be generic and cross-platform as well, although my question here is somewhat Linux focused. A related question specifically about Mac seems to be this . The MacOSX cp
has the -c
option to clone a file.
While searching further, I actually found the answer, and a related issue report.
Issue 37157 (shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)) is exactly about that, which would useFICLONE
/ FICLONERANGE
on Linux.
So I assume that shutil
would support this in upcoming Python versions (maybe starting with Python 3.9?).
There is os.copy_file_range
(since Python 3.8), which wraps copy_file_range
(Linux).
However, according to issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)) , Giampaolo Rodola:
Nope, [copy_file_range] doesn't [support CoW] (see man page). We can simply use FICLONE (cp does the same).
However, I'm not sure this is correct, as the copy_file_range
man page says:
copy_file_range() gives filesystems an opportunity to implement "copy acceleration" techniques, such as the use of reflinks (ie, two or more inodes that share pointers to the same copy- on-write disk blocks) or server-side-copy (in the case of NFS).
Issue 26826 (Expose new copy_file_range() syscall in os module) has this comment by Giampaolo Rodola:
I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). ...
Again, as noted already in the question, this does not work on ZFS yet, see this issue .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.