简体繁体 English

在系统级别擦除文件中已读取的数据

[英]Erasing of already read data from file at the systems level

原文 2016-07-10 01:27:20 6 1 architecture/ operating-system/ filesystems/ system

Suppose you have a data file that some application is writing to. 假设您有一些应用程序正在写入的数据文件。 Once the data is read, it has no use so it can be deleted to save space. 读取数据后，它将无用，因此可以将其删除以节省空间。 Here my design for handling this: 这是我处理此问题的设计：

1) Once the file reaches a certain size, acquire a lock on it 2) Look at the index (pointer representing the byte within the file you're currently on) of a read call in the vnode table for this file 3) Delete all file data that comes before the index 4) Update the index of the read call to the new beginning of the file 5) Unlock the file so that reading/writing can resume 1）文件达到一定大小后，对其进行锁定2）在该文件的vnode表中查看读取调用的索引（代表您当前所在文件中的字节的指针）3）删除所有文件索引之前的文件数据4）将read调用的索引更新为文件的新开头5）解锁文件，以便可以继续进行读取/写入

I don't have much systems experience myself, but I assume that if this is done at the systems level then it's language independent (ie if an application is using a java call or a python call to read/write there is no problem). 我自己没有太多的系统经验，但是我假设，如果在系统级别完成此操作，则它是独立于语言的（即，如果应用程序正在使用java调用或python调用进行读取/写入，则没有问题）。

The data file is on a unix v6 file system. 数据文件在unix v6文件系统上。 Monitoring the size of a file and deleting data is no problem, but I can't find a system call to 1) access other entries in the vnode table to see where they are at in reading the file and 2) Update the read pointers of these system calls. 监视文件的大小并删除数据没有问题，但是我找不到系统调用：1）访问vnode表中的其他条目以查看它们在读取文件时所处的位置，以及2）更新以下文件的读取指针这些系统调用。

1 个解决方案

A Unix v6 file system doesn't make any sense. Unix v6文件系统没有任何意义。 I'm sure you don't mean the v6 Unix system released by Bell Labs for the PDP-11 in 1975; 我敢肯定您不是说贝尔实验室在1975年为PDP-11发布的v6 Unix系统； for one thing, I'm quite sure it never supported Java or Python! 一方面，我很确定它从未支持Java或Python！ So you probably need to be a bit more specific what system you mean. 因此，您可能需要更具体地说明系统的含义。 It can't be OpenBSD (which hasn't released a version 6 yet), or FreeBSD (which is up to version 10 and version 6 would be quite obsolete). 它不能是OpenBSD（尚未发布版本6），也不可以是FreeBSD（最多可以包含版本10和版本6）。 Maybe NetBSD? 也许是NetBSD？

In any case, I wouldn't recommend trying to make changes in the kernel to support this, as it would be very non-portable, and far more difficult than you think to do things right. 无论如何，我都不建议尝试在内核中进行更改以支持此操作，因为这将是非常不可移植的，并且比您认为正确的事情要困难得多。 In particular, renumbering the mapping between logical to physical blocks is tricky . 特别地，重新编号逻辑块到物理块之间的映射是很棘手的 。 There are operating systems and file systems (such as Linux's ext4 and xfs) that support the PUNCH HOLE operation, which will deallocate blocks between specified starting and ending offset, so long as those offsets are multiples of the file system block size. 有一些支持PUNCH HOLE操作的操作系统和文件系统（例如Linux的ext4和xfs），只要这些偏移量是文件系统块大小的倍数，它们就会在指定的起始偏移量和结束偏移量之间取消分配块。 You can even use a COLLAPSE RANGE operation that will "delete" bytes from the beginning or middle of the file --- but again, they have to be multiples of the file system block size, and they are not going to affect the file offsets of any open file descriptors. 您甚至可以使用COLLAPSE RANGE操作，该操作将从文件的开头或中间“删除”字节---但同样，它们必须是文件系统块大小的倍数，并且它们不会影响文件的偏移量任何打开的文件描述符。 COLLAPSE RANGE is in general a really bad idea, since it requires eliminating all of the cached pages for the file in order to maintain page cache consistency. 通常，COLLAPSE RANGE是一个非常糟糕的主意，因为它需要删除文件的所有缓存页面以保持页面缓存的一致性。 The times when it is useful are extremely rare. 它有用的时代非常罕见。 The main use case seems to be people who are manipulating really large video files. 主要用例似乎是正在处理非常大的视频文件的人。 But architecturally, this is almost certainly not what you want to do. 但是在架构上，这几乎肯定不是您想要执行的操作。

What I would recommend is to do this in userspace. 我建议在用户空间中执行此操作。 Yes, it means you will need to implement support in Java and Python, but trust me, this will be easier than trying to do kernel and file system level hacking. 是的，这意味着您将需要在Java和Python中实现支持，但是请相信我，这比尝试进行内核和文件系统级黑客攻击要容易得多。 If you really insist, you can create a C library and then create SWIG interfaces that can be called from Java and Python, but it's actually probably easier to reimplement the logic twice in idiomatic and native Java and Python code. 如果确实坚持，可以先创建一个C库，然后创建可以从Java和Python调用的SWIG接口，但是用惯用的和本机的Java和Python代码重新实现两次逻辑实际上可能更容易。

What I would do is to have the writer write in chunks that are approximately 1 megabyte in size or so, once a chunk reaches a certain size, start a new chunk in a new file. 我要做的是让编写者以大约1兆字节左右的大小写块，一旦块达到一定大小，就在新文件中启动一个新块。 Name the chunks numerically --- ie, data0001, data0002, etc. The reader can just simply read the chunk, and when it is done with it, delete the chunk file, and move on. 用数字命名数据块---即data0001，data0002等。读取器可以简单地读取数据块，完成后，删除数据块文件，然后继续。

Really simple, and it won't get you lost in the weeds of trying to do kernel-level hacking. 真的很简单，它不会让您迷失于尝试进行内核级黑客攻击的杂草。