简体   繁体   English

估计NTFS卷上的USN记录数

[英]Estimate the number of USN records on NTFS volume

When the USN journal is used for the first time, the volume's entire set of USN records must be enumerated using the FSCTL_ENUM_USN_DATA control code. 首次使用USN日志时,必须使用FSCTL_ENUM_USN_DATA控制代码枚举卷的整个USN记录集。 This is usually a lengthy operation. 这通常是一个冗长的操作。

Is there a way to estimate the number of records on the volume prior to running it, so progress can be displayed? 有没有一种方法可以在运行卷之前估计该卷上的记录数,以便可以显示进度?

I'm guessing the USN data for the entire volume is generated from the MFT, with one record per file (approximately). 我猜整个卷的USN数据都是从MFT生成的,每个文件一个记录(大约)。 So perhaps a way to estimate the number of active files in the MFT would work. 因此,也许可以估算MFT中活动文件的数量。

You can use FSCTL_GET_NTFS_VOLUME_DATA to get the length in bytes of the MFT. 您可以使用FSCTL_GET_NTFS_VOLUME_DATA来获取MFT的长度(以字节为单位)。 If you compare this to the number of records on a selection of representative volumes, you could estimate the average length of a single MFT record and use this to calculate an estimate for the number of records on a particular volume. 如果将此与选定的代表性卷上的记录数进行比较,则可以估计单个MFT记录的平均长度,并使用此值来计算特定卷上的记录数的估计值。

Because the MFT contains (for example) the security information for every file, the average length will vary significantly from volume to volume, so I think you'll only get order-of-magnitude accuracy, but it may be good enough in most cases. 因为MFT包含(例如)每个文件的安全性信息,所以平均长度在卷与卷之间会有很大差异,因此我认为您只能获得数量级的准确性,但是在大多数情况下它可能就足够了。

Another approach would be to assume that the file reference numbers increase linearly, which is roughly true. 另一种方法是假设文件参考号线性增加,这是正确的。 You can use FSCTL_ENUM_USN_DATA to find out whether there are any files with a reference number above a particular guess or not; 您可以使用FSCTL_ENUM_USN_DATA来查找是否存在参考编号高于特定猜测的文件; you'd need no more than 128 guesses to determine the actual maximum reference number. 您只需不超过128个猜测即可确定实际的最大参考数字。 That would at least give you a percentage complete between 0 and 100 at any given point, it wouldn't be entirely uniform but then progress bars never are. 至少在任何给定点上您的完成百分比都在0到100之间,虽然不是完全一致,但是进度条永远不会。 :-) :-)

Additional: 额外:

Looking more closely, on Windows 7 x64 the "next id" field returned by FSCTL_ENUM_USN_DATA (the quadword returned before the first USN_RECORD structure) isn't a file reference number after all, but the file record segment number. 仔细观察一下,在Windows 7 x64上,由FSCTL_ENUM_USN_DATA返回的“下一个id”字段(第一个USN_RECORD结构之前返回的四字)毕竟不是文件引用号,而是文件记录段号。 So, as you observed, the last id number returned, multiplied by BytesPerFileRecordSegment (1024), is equal to MftValidDataLength. 因此,如您所见,返回的最后一个ID号乘以BytesPerFileRecordSegment(1024),等于MftValidDataLength。

File reference numbers appear to be made up of two parts. 文件参考号似乎由两部分组成。 The low six bytes contain the file record segment number. 低六个字节包含文件记录段号。 The first record returned from each request always has a FRN whose segment number is the same as the "next id" fed into StartFileReferenceNumber, except for the first call when StartFileReferenceNumber is zero. 从每个请求返回的第一条记录始终具有FRN,该段的段号与StartFileReferenceNumber中输入的“下一个ID”相同,但StartFileReferenceNumber为零时的第一次调用除外。 The upper two bytes contain unspecified additional information, which is never zero. 前两个字节包含未指定的附加信息,该信息永远不会为零。

It seems that FSCTL_ENUM_USN_DATA accepts either a file record segment number (in which case the top two bytes are zero) or a file reference number (in which case the top two bytes are nonzero). 似乎FSCTL_ENUM_USN_DATA接受文件记录段号(在此情况下,前两个字节为零) 文件参考号(在此情况下,前两个字节非零)。

One oddity is that I can't find two records with the same record segment number. 奇怪的是,我找不到两条记录段号相同的记录。 This suggests that each file record is using at least 1K in the MFT, which doesn't seem reasonable. 这表明每个文件记录在MFT中至少使用1K,这似乎不合理。

Anyway, the upshot is that it is probably sensible to multiply the "next id" by BytesPerFileRecordSegment and divide it by MftValidDataLength to get a percentage completed, so long as you cope gracefully if this returns a nonsensical result. 无论如何,结果是,将“下一个id”乘以BytesPerFileRecordSegment并将其除以MftValidDataLength可能是明智的,只要您能优雅地应对(如果返回的是错误的结果)。

In fact the MftValidDataLength field of the NTFS_VOLUME_DATA_BUFFER / NTFS_EXTENDED_VOLUME_DATA structure(s) place an upper limit on the number of USN records that will/would be returned by FSCTL_ENUM_USN_DATA (that is, assuming additional records aren't added to the journal between the time that you measure the estimate and the enumeration...) 实际上, MftValidDataLength / NTFS_EXTENDED_VOLUME_DATA结构的MftValidDataLength字段对NTFS_VOLUME_DATA_BUFFER将/将要返回的USN记录数设置了FSCTL_ENUM_USN_DATA (即,假设在这段时间之间未在日志中添加其他记录)您测量估算值和枚举...)

In the C# example below, I divide the vd.MftValidDataLength value by vd.BytesPerFileRecordSegment , being sure to round-up by first adding dividend - 1 before dividing. 在下面的C#示例中,我将vd.MftValidDataLength值除以vd.BytesPerFileRecordSegment ,确保通过在除法之前先添加dividend - 1来进行舍入 As for the divisor, I believe that its value here is always universally 1,024 on any platform or system, in case you prefer to hard-code it. 至于除数,我相信在任何平台或系统上,它的价值通常通常为1,024 ,以防您希望对其进行硬编码。

[Serializable, StructLayout(LayoutKind.Sequential)]
public struct NTFS_EXTENDED_VOLUME_DATA
{
    public VOLUME_ID     /**/ VolumeSerialNumber;
    public long          /**/ NumberSectors;
    public long          /**/ TotalClusters;
    public long          /**/ FreeClusters;
    public long          /**/ TotalReserved;
    public uint          /**/ BytesPerSector;
    public uint          /**/ BytesPerCluster;
    public int           /**/ BytesPerFileRecordSegment;   // <--
    public uint          /**/ ClustersPerFileRecordSegment;
    public long          /**/ MftValidDataLength;          // <--
    public long          /**/ MftStartLcn;
    public long          /**/ Mft2StartLcn;
    public long          /**/ MftZoneStart;
    public long          /**/ MftZoneEnd;
    public uint          /**/ ByteCount;
    public ushort        /**/ MajorVersion;
    public ushort        /**/ MinorVersion;
    public uint          /**/ BytesPerPhysicalSector;
    public ushort        /**/ LfsMajorVersion;
    public ushort        /**/ LfsMinorVersion;
    public uint          /**/ MaxDeviceTrimExtentCount;
    public uint          /**/ MaxDeviceTrimByteCount;
    public uint          /**/ MaxVolumeTrimExtentCount;
    public uint          /**/ MaxVolumeTrimByteCount;
};

Typical constants, abridged for clarity: 典型常量,为清楚起见而被删节:

public enum FSCTL : uint
{
    // etc...     etc...
    FILESYSTEM_GET_STATISTICS   /**/ = (9 << 16) | 0x0060,
    GET_NTFS_VOLUME_DATA        /**/ = (9 << 16) | 0x0064,  // <--
    GET_NTFS_FILE_RECORD        /**/ = (9 << 16) | 0x0068,
    GET_VOLUME_BITMAP           /**/ = (9 << 16) | 0x006f,
    GET_RETRIEVAL_POINTERS      /**/ = (9 << 16) | 0x0073,
    // etc...     etc...
    ENUM_USN_DATA               /**/ = (9 << 16) | 0x00b3,
    READ_USN_JOURNAL            /**/ = (9 << 16) | 0x00bb,
    // etc...     etc...
    CREATE_USN_JOURNAL          /**/ = (9 << 16) | 0x00e7,
    // etc...     etc...
};

Pseudo-code follows, since everyone has their own favorite ways of doing P/Invoke... 伪代码如下,因为每个人都有自己喜欢的P / Invoke方式...

// etc..

if (!GetDeviceIoControl(h_vol, FSCTL.GET_NTFS_VOLUME_DATA, out NTFS_EXTENDED_VOLUME_DATA vd))
    throw new Win32Exception(Marshal.GetLastWin32Error());

var c_mft_estimate = (vd.MftValidDataLength + (vd.BytesPerFileRecordSegment - 1))
                                                        / vd.BytesPerFileRecordSegment;

Great, so what can you do with this value? 太好了,那么您可以用这个值做什么? Unfortunately, knowing this maximum cap on the number of USN records that FSCTL_ENUM_USN_DATA will return doesn't help with choosing a buffer size for the DeviceIoControl/FSCTL_ENUM_USN_DATA call themselves, since the USN_RECORD structures returned in each iteration vary in size according to the length of the reported filenames. 不幸的是,知道FSCTL_ENUM_USN_DATA将返回的USN记录数量的最大上限无助于为DeviceIoControl/FSCTL_ENUM_USN_DATA调用自身选择缓冲区大小,因为每次迭代返回的USN_RECORD结构的大小会根据长度的不同而有所不同。报告的文件名。

So while it is true that, if you happen to provide a buffer large enough for all of the USN_RECORD structures, then DeviceIoControl will indeed dutifully provide them all to you in a single call (thus avoiding the complication of an iterative-calling loop, which simplifies the code considerably), the little calculation above doesn't give any principled estimation of that buffer size, unless you're willing to settle for using it towards some kind of gross overestimation. 因此,虽然确实是这样,但是如果您恰巧为所有 USN_RECORD结构提供了足够大的缓冲区,则DeviceIoControl确实会在一次调用中尽职尽责地将所有这些USN_RECORD 提供给您(从而避免了迭代调用循环的复杂性,大大简化了代码),上面的少量计算并没有对该缓冲区大小进行任何原则上的估计,除非您愿意将其用于某种总的高估。

What the value is useful for, rather, is for pre-allocating your own fixed-size data structures, which you'll surely need, prior to the FSCTL_ENUM_USN_DATA enumeration operation. 该值用途是,在FSCTL_ENUM_USN_DATA枚举操作之前,预先分配一定需要的固定大小的数据结构。 So if you have your own value-type which you'll create for each USN entry (dummy struct, just for example...) 因此,如果您有自己的值类型,则将为每个USN条目创建该值类型(虚拟结构,例如...)

[StructLayout(LayoutKind.Sequential)]
public struct MFT_IX_REC
{
    public ushort seq;
    public ushort parent_ix_hi;
    public uint parent_ix;
};

Then, using the estimate from above, you can pre-allocate an array of these before the DeviceIoControl and never have to worry about resizing during the iteration. 然后,使用上面的估算,您可以在DeviceIoControl之前预先分配这些数组,而不必担心在迭代过程中调整大小。

var med = new MFT_ENUM_DATA { ... };
// ...

var rg_mftix = new MFT_IX_REC[c_mft_estimate];
// ... ready to go, without having to check whether the array needs resizing within the loop

for (int i=0; DeviceIoControl(h_vol, FSCTL.ENUM_USN_DATA, in med, out USN_RECORD usn, ...); i++)
{
    // etc..
    rg_mftix[i].parent_ix = (uint)usn.ParentId;
    // etc..
}

This elimination of the dynamic array-resizing, usually needed when you don't know the number of entries in advance, is a non-trivial performance benefit, because it avoids the expensive jumbo-sized memcpy operations required for copying the existing data from the old array to a new, larger one each time you resize. 这种动态数组大小调整的消除(通常在您不预先知道条目数时需要)是不平凡的性能优势,因为它避免了从memcpy复制现有数据所需的庞大的巨型memcpy操作。每次调整大小时,将旧阵列换成一个更大的新阵列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM