简体   繁体   中英

Getting “size on disk” for small files in Powershell

I'm working with a legacy system that has numerous imports from external systems, most of which function by downloading a file (of varying sizes depending on context), processing it and then storing the file elsewhere on a SAN volume (formatted as NTFS and mounted on a WS2008R2 box). The problem we're having is that the sheer volume of little files ends up wasting large amounts of disk space due to the cluster size.

Ideally we'd locate the worst offending import processes and put in place some automated archiving on the files into .zip files or something similar. Building a report on this should be a relatively simple problem, but I'm struggling to get an accurate "size on disk" (as seen in Explorer). (Yes we could just archive everything after X days, but it's not ideal and doesn't necessarily help tune import processes that could be adapted somewhat to avoid the issue)

I've seen answers like: How to get the actual size-on-disk of a file from PowerShell? but whilst they work well for dealing with compressed folders, I just get the same value as the file length for short files and so underestimate true disk usage.

The files on the volume vary from some small enough to fit into the MFT records, some which only occupy a small percentage of a cluster and others that are very large. NTFS Compression isn't enabled anywhere on the volume, though a solution which could accommodate that would be more future-proof as we may enable it in future. The volume is normally accessed via a UNC share so if it's possible to determine usage via the share (Explorer seems able to) that would be great, but it's not essential as the script can always run on the server itself and access the drive directly.

You need a little P/invoke:

add-type -type  @'
using System;
using System.Runtime.InteropServices;
using System.ComponentModel;
using System.IO;

namespace Win32Functions
{
  public class ExtendedFileInfo
  {    
    public static long GetFileSizeOnDisk(string file)
    {
        FileInfo info = new FileInfo(file);
        uint dummy, sectorsPerCluster, bytesPerSector;
        int result = GetDiskFreeSpaceW(info.Directory.Root.FullName, out sectorsPerCluster, out bytesPerSector, out dummy, out dummy);
        if (result == 0) throw new Win32Exception();
        uint clusterSize = sectorsPerCluster * bytesPerSector;
        uint hosize;
        uint losize = GetCompressedFileSizeW(file, out hosize);
        long size;
        size = (long)hosize << 32 | losize;
        return ((size + clusterSize - 1) / clusterSize) * clusterSize;
    }

    [DllImport("kernel32.dll")]
    static extern uint GetCompressedFileSizeW([In, MarshalAs(UnmanagedType.LPWStr)] string lpFileName,
       [Out, MarshalAs(UnmanagedType.U4)] out uint lpFileSizeHigh);

    [DllImport("kernel32.dll", SetLastError = true, PreserveSig = true)]
    static extern int GetDiskFreeSpaceW([In, MarshalAs(UnmanagedType.LPWStr)] string lpRootPathName,
       out uint lpSectorsPerCluster, out uint lpBytesPerSector, out uint lpNumberOfFreeClusters,
       out uint lpTotalNumberOfClusters);  
  }
}
'@

Use like this:

[Win32Functions.ExtendedFileInfo]::GetFileSizeOnDisk( 'C:\ps\examplefile.exe' )
59580416

it returns the 'size on disk' that you read in properties file from explore.

With the answer above (by CB), I found the returned size was always either 4127 (obviously based on my Cluster Size - 4096) above the correct size on disk or 4127 above the actual size. In the case of it being above the actual size, the files I've tested are either 0 bytes on disk or the size on disk is bigger than the actual size.

I also found that files above UInteger.MaxValue (4294967295) have incorrect sizes, which I also worked out how to get accurately in the code below. This required me to up the variable sizes (UInt32 and Int64 to Double). Note that I've used an arithmetic way of calculating the final size, but see the comments for a bitwise way.

I used the following code to get the most accurate answer, if it's incorrect the returned size will be exactly the same as the actual size, which will happen if the file is 0 bytes on disk or if the size on disk is bigger:

using System;
using System.Runtime.InteropServices;

public class ExtendedFileInfo
{
    public static double GetFileSizeOnDisk(string file)
    {
        uint hosize;
        uint losize = GetCompressedFileSizeW(file, out hosize);
        double size = (uint.MaxValue + 1L) * hosize + losize;
        return size;
    }
    
    [DllImport("kernel32.dll")]
    static extern uint GetCompressedFileSizeW(
        [In, MarshalAs(UnmanagedType.LPWStr)] string lpFileName,
        [Out, MarshalAs(UnmanagedType.U4)] out uint lpFileSizeHigh);
}

And the VB.Net version:

Imports System
Imports System.Runtime.InteropServices

Public Class ExtendedFileInfo
    Public Shared Function GetFileSizeOnDisk(file As String) As Double
        Dim hosize As UInteger
        Dim losize As UInteger = GetCompressedFileSizeW(file, hosize)
        Dim size As Double = (UInteger.MaxValue + 1) * hosize + losize
        Return size
    End Function

    <DllImport("kernel32.dll")> _
    Private Shared Function GetCompressedFileSizeW(
        <[In], MarshalAs(UnmanagedType.LPWStr)> lpFileName As String,
        <Out, MarshalAs(UnmanagedType.U4)> ByRef lpFileSizeHigh As UInteger) As UInteger
    End Function
End Class

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM