[英]Copy file in Python with copy-on-write (COW)
My filesystem (FS) (ZFS specifically) supports copy-on-write (COW), ie a copy (if done right) is a very cheap constant operation, and does not actually copy the underlying content.我的文件系统 (FS)(特别是 ZFS)支持写时复制 (COW),即复制(如果操作正确)是一种非常便宜的常量操作,实际上并不复制底层内容。 The content is copied only once I write/modify the new file.
仅在我编写/修改新文件后才复制内容。
Actually, I just found out, ZFS-on-Linux actually has not implemented that for userspace yet (right?).实际上,我刚刚发现,ZFS-on-Linux 实际上还没有为用户空间实现它(对吧?)。 But eg BTRFS or XFS has.
但是例如 BTRFS 或 XFS 有。 (See here , here , here , here .)
(见这里, 这里,这里, 这里。)
For the (GNU) cp
utility, you would pass --reflink=always
option (see here .) cp
calls ioctl (dest_fd, FICLONE, src_fd)
(see here , here ).对于 (GNU)
cp
实用程序,您将传递--reflink=always
选项(参见此处。) cp
调用ioctl (dest_fd, FICLONE, src_fd)
(参见此处, 此处)。
How would I get this behavior (if possible) in Python?我将如何在 Python 中获得这种行为(如果可能)?
I assume that "zero-copy" (eg here via os.sendfile
) would not result in such behavior, right?我假设“零拷贝”(例如这里通过
os.sendfile
)不会导致这种行为,对吧? Because looking at shutil
s _fastcopy_sendfile
implementation ( here ), it is still a loop around os.sendfile
using some custom byte count (supposed to be the block size, max(os.fstat(infd).st_size, 2 ** 23)
).因为查看
shutil
的_fastcopy_sendfile
实现( 此处),它仍然是使用一些自定义字节数(假设是块大小, max(os.fstat(infd).st_size, 2 ** 23)
)围绕os.sendfile
的循环. Or would it?还是会?
The COW, is this on a file level, or block level? COW,这是在文件级别还是块级别?
If possible, I want this to be generic and cross-platform as well, although my question here is somewhat Linux focused.如果可能的话,我希望它也是通用的和跨平台的,尽管我的问题有点集中在 Linux 上。 A related question specifically about Mac seems to be this .
一个专门关于 Mac 的相关问题似乎是this 。 The MacOSX
cp
has the -c
option to clone a file. MacOSX
cp
有-c
选项来克隆文件。
While searching further, I actually found the answer, and a related issue report.在进一步搜索时,我实际上找到了答案,以及相关的问题报告。
Issue 37157 (shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)) is exactly about that, which would useFICLONE
/ FICLONERANGE
on Linux.问题 37157(shutil:将 reflink=False 添加到文件复制函数以控制克隆/CoW 副本(使用 copy_file_range))正是如此,它将在 Linux 上使用
FICLONE
FICLONERANGE
FICLONERANGE。
So I assume that shutil
would support this in upcoming Python versions (maybe starting with Python 3.9?).因此,我假设
shutil
将在即将推出的 Python 版本中支持此功能(可能从 Python 3.9 开始?)。
There is os.copy_file_range
(since Python 3.8), which wraps copy_file_range
(Linux).有
os.copy_file_range
(自 Python 3.8 起),它包装了copy_file_range
(Linux)。
However, according to issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)) , Giampaolo Rodola:但是,根据issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)) ,Giampaolo Rodola:
Nope, [copy_file_range] doesn't [support CoW] (see man page).
不,[copy_file_range] 不 [support CoW](参见手册页)。 We can simply use FICLONE (cp does the same).
我们可以简单地使用 FICLONE(cp 也是如此)。
However, I'm not sure this is correct, as the copy_file_range
man page says:但是,我不确定这是正确的,正如
copy_file_range
手册页所说:
copy_file_range() gives filesystems an opportunity to implement "copy acceleration" techniques, such as the use of reflinks (ie, two or more inodes that share pointers to the same copy- on-write disk blocks) or server-side-copy (in the case of NFS).
copy_file_range() 为文件系统提供了实现“复制加速”技术的机会,例如使用 reflink(即,两个或多个 inode 共享指向相同的写时复制磁盘块的指针)或服务器端复制(在NFS 的情况)。
Issue 26826 (Expose new copy_file_range() syscall in os module) has this comment by Giampaolo Rodola:问题 26826(在 os 模块中公开新的 copy_file_range() 系统调用)有 Giampaolo Rodola 的评论:
I think data deduplication / CoW / reflink copy is better implemented via FICLONE.
我认为通过 FICLONE 更好地实现重复数据删除/CoW/reflink 复制。 "cp --reflink" uses it, I presume because it's older than copy_file_range().
“cp --reflink”使用它,我想是因为它比 copy_file_range() 更老。 ...
...
Again, as noted already in the question, this does not work on ZFS yet, see this issue .同样,正如问题中已经指出的那样,这在 ZFS 上还不起作用,请参阅此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.