简体   繁体   English

使用写时复制 (COW) 复制 Python 中的文件

[英]Copy file in Python with copy-on-write (COW)

My filesystem (FS) (ZFS specifically) supports copy-on-write (COW), ie a copy (if done right) is a very cheap constant operation, and does not actually copy the underlying content.我的文件系统 (FS)(特别是 ZFS)支持写时复制 (COW),即复制(如果操作正确)是一种非常便宜的常量操作,实际上并不复制底层内容。 The content is copied only once I write/modify the new file.仅在我编写/修改新文件后才复制内容。

Actually, I just found out, ZFS-on-Linux actually has not implemented that for userspace yet (right?).实际上,我刚刚发现,ZFS-on-Linux 实际上还没有为用户空间实现它(对吧?)。 But eg BTRFS or XFS has.但是例如 BTRFS 或 XFS 有。 (See here , here , here , here .) (见这里这里这里这里。)

For the (GNU) cp utility, you would pass --reflink=always option (see here .) cp calls ioctl (dest_fd, FICLONE, src_fd) (see here , here ).对于 (GNU) cp实用程序,您将传递--reflink=always选项(参见此处。) cp调用ioctl (dest_fd, FICLONE, src_fd) (参见此处此处)。

How would I get this behavior (if possible) in Python?我将如何在 Python 中获得这种行为(如果可能)?

I assume that "zero-copy" (eg here via os.sendfile ) would not result in such behavior, right?我假设“零拷贝”(例如这里通过os.sendfile )不会导致这种行为,对吧? Because looking at shutil s _fastcopy_sendfile implementation ( here ), it is still a loop around os.sendfile using some custom byte count (supposed to be the block size, max(os.fstat(infd).st_size, 2 ** 23) ).因为查看shutil_fastcopy_sendfile实现( 此处),它仍然是使用一些自定义字节数(假设是块大小, max(os.fstat(infd).st_size, 2 ** 23) )围绕os.sendfile的循环. Or would it?还是会?

The COW, is this on a file level, or block level? COW,这是在文件级别还是块级别?

If possible, I want this to be generic and cross-platform as well, although my question here is somewhat Linux focused.如果可能的话,我希望它也是通用的和跨平台的,尽管我的问题有点集中在 Linux 上。 A related question specifically about Mac seems to be this .一个专门关于 Mac 的相关问题似乎是this The MacOSX cp has the -c option to clone a file. MacOSX cp-c选项来克隆文件。

While searching further, I actually found the answer, and a related issue report.在进一步搜索时,我实际上找到了答案,以及相关的问题报告。

Issue 37157 (shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)) is exactly about that, which would useFICLONE / FICLONERANGE on Linux.问题 37157(shutil:将 reflink=False 添加到文件复制函数以控制克隆/CoW 副本(使用 copy_file_range))正是如此,它将在 Linux 上使用FICLONE FICLONERANGE FICLONERANGE。

So I assume that shutil would support this in upcoming Python versions (maybe starting with Python 3.9?).因此,我假设shutil将在即将推出的 Python 版本中支持此功能(可能从 Python 3.9 开始?)。

There is os.copy_file_range (since Python 3.8), which wraps copy_file_range (Linux).os.copy_file_range (自 Python 3.8 起),它包装了copy_file_range (Linux)。

However, according to issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)) , Giampaolo Rodola:但是,根据issue 37159 (Use copy_file_range() in shutil.copyfile() (server-side copy)) ,Giampaolo Rodola:

Nope, [copy_file_range] doesn't [support CoW] (see man page).不,[copy_file_range] 不 [support CoW](参见手册页)。 We can simply use FICLONE (cp does the same).我们可以简单地使用 FICLONE(cp 也是如此)。

However, I'm not sure this is correct, as the copy_file_range man page says:但是,我不确定这是正确的,正如copy_file_range手册页所说:

copy_file_range() gives filesystems an opportunity to implement "copy acceleration" techniques, such as the use of reflinks (ie, two or more inodes that share pointers to the same copy- on-write disk blocks) or server-side-copy (in the case of NFS). copy_file_range() 为文件系统提供了实现“复制加速”技术的机会,例如使用 reflink(即,两个或多个 inode 共享指向相同的写时复制磁盘块的指针)或服务器端复制(在NFS 的情况)。

Issue 26826 (Expose new copy_file_range() syscall in os module) has this comment by Giampaolo Rodola:问题 26826(在 os 模块中公开新的 copy_file_range() 系统调用)有 Giampaolo Rodola 的评论:

I think data deduplication / CoW / reflink copy is better implemented via FICLONE.我认为通过 FICLONE 更好地实现重复数据删除/CoW/reflink 复制。 "cp --reflink" uses it, I presume because it's older than copy_file_range(). “cp --reflink”使用它,我想是因为它比 copy_file_range() 更老。 ... ...

Again, as noted already in the question, this does not work on ZFS yet, see this issue .同样,正如问题中已经指出的那样,这在 ZFS 上还不起作用,请参阅此问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM