简体   繁体   English

带日期过滤器的 C# GetFiles

[英]C# GetFiles with Date Filter

Is there a more efficient way to populate a list of file names from a directory with a date filter?有没有更有效的方法来从带有日期过滤器的目录中填充文件名列表?

Currently, I'm doing this:目前,我正在这样做:

foreach (FileInfo flInfo in directory.GetFiles())
{
    DateTime yesterday = DateTime.Today.AddDays(-1);
    String name = flInfo.Name.Substring(3,4);
    DateTime creationTime = flInfo.CreationTime;
    if (creationTime.Date == yesterday.Date)
       yesterdaysList.Add(name);
}

This goes through every file in the folder, and I feel like there should be a more efficient way.这会遍历文件夹中的每个文件,我觉得应该有一种更有效的方法。

First Solution:第一个解决方案:

You can use LINQ:您可以使用 LINQ:

List<string> yesterdaysList = directory.GetFiles().Where(x => x.CreationTime.Date == DateTime.Today.AddDays(-1))
                                                  .Select(x => x.Name)
                                                  .ToList();

Then you can use directly this list of names.然后你可以直接使用这个名字列表。

Second Solution:第二种解决方案:

Another solution to make it faster could be:另一个使其更快的解决方案可能是:

DateTime yesterday = DateTime.Today.AddDays(-1); //initialize this variable only one time

foreach (FileInfo flInfo in directory.GetFiles()){
    if (flInfo.CreationTime.Date == yesterday.Date) //use directly flInfo.CreationTime and flInfo.Name without create another variable 
       yesterdaysList.Add(flInfo.Name.Substring(3,4));
}

Benchmark:基准:

I did a benchmark by using this code:我使用以下代码进行了基准测试:

class Program {
    static void Main( string[ ] args ) {
        DirectoryInfo directory = new DirectoryInfo( @"D:\Films" );
        Stopwatch timer = new Stopwatch( );
        timer.Start( );

        for ( int i = 0; i < 100000; i++ ) {
            List<string> yesterdaysList = directory.GetFiles( ).Where( x => x.CreationTime.Date == DateTime.Today.AddDays( -1 ) )
                                              .Select( x => x.Name )
                                              .ToList( );
        }

        timer.Stop( );
        TimeSpan elapsedtime = timer.Elapsed;
        Console.WriteLine( string.Format( "{0:00}:{1:00}:{2:00}", elapsedtime.Minutes, elapsedtime.Seconds, elapsedtime.Milliseconds / 10 ) );
        timer.Restart( );

        DateTime yesterday = DateTime.Today.AddDays( -1 ); //initialize this variable only one time
        for ( int i = 0; i < 100000; i++ ) {
            List<string> yesterdaysList = new List<string>( );

            foreach ( FileInfo flInfo in directory.GetFiles( ) ) {
                if ( flInfo.CreationTime.Date == yesterday.Date ) //use directly flInfo.CreationTime and flInfo.Name without create another variable 
                    yesterdaysList.Add( flInfo.Name.Substring( 3, 4 ) );
            }
        }


        timer.Stop( );
        elapsedtime = timer.Elapsed;
        Console.WriteLine( string.Format("{0:00}:{1:00}:{2:00}", elapsedtime.Minutes, elapsedtime.Seconds, elapsedtime.Milliseconds / 10));
        timer.Restart( );

        for ( int i = 0; i < 100000; i++ ) {
            List<string> list = new List<string>( );

            foreach ( FileInfo flInfo in directory.GetFiles( ) ) {
                DateTime _yesterday = DateTime.Today.AddDays( -1 );
                String name = flInfo.Name.Substring( 3, 4 );
                DateTime creationTime = flInfo.CreationTime;
                if ( creationTime.Date == _yesterday.Date )
                    list.Add( name );
            }
        }

        elapsedtime = timer.Elapsed;
        Console.WriteLine( string.Format( "{0:00}:{1:00}:{2:00}", elapsedtime.Minutes, elapsedtime.Seconds, elapsedtime.Milliseconds / 10 ) );
    }
}

Results:结果:

First solution: 00:19:84
Second solution: 00:17:64
Third solution: 00:19:91 //Your solution

I think you are after getting more efficiency at the file system level, not at the C# level.我认为您是在文件系统级别获得更高的效率,而不是在 C# 级别。 If that is the case the answer is no : There is no way to tell the file system to filter by date.如果是这种情况,答案是否定的:无法告诉文件系统按日期过滤。 It will needlessly return everything.它会不必要地返回一切。

If you are after CPU efficiency: This is pointless becauseadding items to a listbox is so incredibly more expensive than filtering on date.如果您追求 CPU 效率:这是毫无意义的,因为将项目添加到列表框比按日期过滤要贵得多。 Optimizing your code will yield no results.优化您的代码不会产生任何结果。

I didn't feel like creating enough files with the correct creation date to do a decent benchmark, so I did a more general version that takes a start and end time and gives out the names of files that match.我不想用正确的创建日期创建足够的文件来做一个不错的基准测试,所以我做了一个更通用的版本,它需要开始和结束时间并给出匹配的文件的名称。 Making it give a particular substring of files created yesterday follows naturally from that.让它给出昨天创建的文件的特定子字符串,自然而然地随之而来。

The quickest single-threaded pure .NET answer I came up with was:我想出的最快的单线程纯 .NET 答案是:

private static IEnumerable<string> FilesWithinDates(string directory, DateTime minCreated, DateTime maxCreated)
{
    foreach(FileInfo fi in new DirectoryInfo(directory).GetFiles())
        if(fi.CreationTime >= minCreated && fi.CreationTime <= maxCreated)
            yield return fi.Name;
}

I would have expected EnumerateFiles() to be slightly faster, but it turned out slightly slower (might do better if you're going over a network, but I didn't test that).我原以为EnumerateFiles()会稍微快一点,但结果会稍微慢一点(如果你通过网络访问可能会更好,但我没有测试过)。

There's a slight gain with:有一点点收获:

private static ParallelQuery<string> FilesWithinDates(string directory, DateTime minCreated, DateTime maxCreated)
{
    return new DirectoryInfo(directory).GetFiles().AsParallel()
        .Where(fi => fi.CreationTime >= minCreated && fi.CreationTime <= maxCreated)
        .Select(fi => fi.Name);
}

But not much since it doesn't help the actual call to GetFiles() .但并不多,因为它对实际调用GetFiles()没有帮助。 If you don't have the cores to use, or there isn't a big enough result from GetFiles() then it'll just make things worse (the overheads of AsParallel() being greater than the benefit of doing the filtering in parallel).如果您没有要使用的内核,或者GetFiles()结果不够大,那么它只会让事情变得更糟( AsParallel()的开销大于并行过滤的好处)。 On the other hand, if you can do your next steps of processing also in parallel, then the overall application speed could improve.另一方面,如果您也可以并行执行后续处理步骤,则整体应用程序速度可能会提高。

There seems to be no point doing this with EnumerateFiles() because it doesn't seem to parallelise well, because it's based on the same approach I'm coming to last, and that's inherently serial - needing the previous result to produce the next.EnumerateFiles()做这件事似乎没有意义,因为它似乎不能很好地并行化,因为它基于我将要持续的相同方法,而且本质上是串行的 - 需要前一个结果来产生下一个。

The fastest I got was:我得到的最快的是:

public const int MAX_PATH = 260;
public const int MAX_ALTERNATE = 14;

[StructLayoutAttribute(LayoutKind.Sequential)]
public struct FILETIME
{
    public uint dwLowDateTime;
    public uint dwHighDateTime;
    public static implicit operator long(FILETIME ft)
    {
        return (((long)ft.dwHighDateTime) << 32) | ft.dwLowDateTime;
    }
};

[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
public struct WIN32_FIND_DATA
{
    public FileAttributes dwFileAttributes;
    public FILETIME ftCreationTime;
    public FILETIME ftLastAccessTime;
    public FILETIME ftLastWriteTime;
    public uint nFileSizeHigh;
    public uint nFileSizeLow;
    public uint dwReserved0;
    public uint dwReserved1;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst=MAX_PATH)]
    public string cFileName;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst=MAX_ALTERNATE)]
    public string cAlternate;
}

[DllImport("kernel32", CharSet=CharSet.Unicode)]
public static extern IntPtr FindFirstFile(string lpFileName, out WIN32_FIND_DATA lpFindFileData);

[DllImport("kernel32", CharSet=CharSet.Unicode)]
public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

[DllImport("kernel32.dll")]
public static extern bool FindClose(IntPtr hFindFile);

private static IEnumerable<string> FilesWithinDates(string directory, DateTime minCreated, DateTime maxCreated)
{
    long startFrom = minCreated.ToFileTimeUtc();
    long endAt = maxCreated.ToFileTimeUtc();
    WIN32_FIND_DATA findData;
    IntPtr findHandle = FindFirstFile(@"\\?\" + directory + @"\*", out findData);
    if(findHandle != new IntPtr(-1))
    {
        do
        {
            if(
                (findData.dwFileAttributes & FileAttributes.Directory) == 0
                &&
                findData.ftCreationTime >= startFrom
                &&
                findData.ftCreationTime <= endAt
            )
            {
                yield return findData.cFileName;
            }
        }
        while(FindNextFile(findHandle, out findData));
        FindClose(findHandle);
    }
}

It's dicey not having that FindClose() promised by an IDisposable , and a hand-rolled implementation of IEnumerator<string> should not only make that easier to do (serious reason for doing it) but also hopefully shave off like 3 nanoseconds or something (not a serious reason for doing it), but the above shows the basic idea.没有IDisposable承诺的FindClose()是冒险的,并且IEnumerator<string>的手动实现不仅应该使这更容易(这样做的严重原因)而且还希望减少 3 纳秒或其他东西(不是这样做的严重原因),但以上显示了基本思想。

I use :我用 :

DirectoryInfo dI = new DirectoryInfo(fileLocation); 
var files = dI.GetFiles().Where(i=>i.CreationTime>=dateFrom && i.CreationTime<=dateTo);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM