简体   繁体   English

如何通过 UNC 加速 Powershell Get-Childitem

[英]How to speed up Powershell Get-Childitem over UNC

DIR or GCI is slow in Powershell, but fast in CMD. DIRGCI在 Powershell 中很慢,但在 CMD 中很快。 Is there any way to speed this up?有没有办法加快这个速度?

In CMD.exe, after a sub-second delay, this responds as fast as the CMD window can keep up在 CMD.exe 中,在亚秒级延迟后,这会以 CMD 窗口可以跟上的速度响应

dir \\remote-server.domain.com\share\folder\file*.*

In Powershell (v2), after a 40+ second delay, this responds with a noticable slowness (maybe 3-4 lines per second)在 Powershell (v2) 中,经过 40 秒以上的延迟后,响应速度明显变慢(可能每秒 3-4 行)

gci \\remote-server.domain.com\share\folder\file*.*

I'm trying to scan logs on a remote server, so maybe there's a faster approach.我正在尝试扫描远程服务器上的日志,所以也许有更快的方法。

get-childitem \\$s\logs -include $filemask -recurse | select-string -pattern $regex

Okay, this is how I'm doing it, and it seems to work.好的,这就是我的做法,它似乎有效。

$files = cmd /c "$GETFILESBAT \\$server\logs\$filemask"
foreach( $f in $files ) {
    if( $f.length -gt 0 ) {
        select-string -Path $f -pattern $regex | foreach-object { $_ }
    }
}

Then $GETFILESBAT points to this:然后 $GETFILESBAT 指向这个:

@dir /a-d /b /s %1
@exit

I'm writing and deleting this BAT file from the PowerShell script, so I guess it's a PowerShell-only solution, but it doesn't use only PowerShell.我正在从 PowerShell 脚本中编写和删除这个 BAT 文件,所以我猜这是一个仅限 PowerShell 的解决方案,但它不仅仅使用 PowerShell。

My preliminary performance metrics show this to be eleventy-thousand times faster.我的初步性能指标显示这要快一万一千倍。

I tested gci vs. cmd dir vs. FileIO.FileSystem.GetFiles from @Shawn Melton's referenced link .我从@Shawn Melton 的引用链接中测试了 gci 与 cmd dir 与 FileIO.FileSystem.GetFiles 。

The bottom line is that, for daily use on local drives, GetFiles is the fastest.最重要的是,对于本地驱动器上的日常使用, GetFiles是最快的。 By far .到目前为止 CMD DIR is respectable. CMD DIR是可敬的。 Once you introduce a slower network connection with many files, CMD DIR is slightly faster than GetFiles .一旦您为许多文件引入了较慢的网络连接, CMD DIRGetFiles稍快。 Then Get-ChildItem ... wow, this ranges from not too bad to horrible, depending on the number of files involved and the speed of the connection.然后Get-ChildItem ......哇,这范围从不太糟糕到可怕,取决于涉及的文件数量和连接速度。

Some test runs.一些测试运行。 I've moved GCI around in the tests to make sure the results were consistent.我在测试中移动了 GCI 以确保结果一致。

10 iterations of scanning c:\\windows\\temp for *.tmp files扫描c:\\windows\\temp以获取 *.tmp 文件的 10 次迭代

.\test.ps1 "c:\windows\temp" "*.tmp" 10
GetFiles ... 00:00:00.0570057
CMD dir  ... 00:00:00.5360536
GCI      ... 00:00:01.1391139

GetFiles is 10x faster than CMD dir, which itself is more than 2x faster than GCI. GetFiles 比 CMD 目录快 10 倍,而 CMD 目录本身比 GCI 快 2 倍以上。

10 iterations of scanning c:\\windows\\temp for *.tmp files with recursion使用递归扫描c:\\windows\\temp以获取 *.tmp 文件的 10 次迭代

.\test.ps1 "c:\windows\temp" "*.tmp" 10 -recurse
GetFiles ... 00:00:00.7020180
CMD dir  ... 00:00:00.7644196
GCI      ... 00:00:04.7737224

GetFiles is a little faster than CMD dir, and both are almost 7x faster than GCI. GetFiles 比 CMD 目录快一点,而且两者都比 GCI 快 7 倍。

10 iterations of scanning an on-site server on another domain for application log files扫描另一个域上的现场服务器以获取应用程序日志文件的 10 次迭代

.\test.ps1 "\\closeserver\logs\subdir" "appname*.*" 10
GetFiles ... 00:00:00.3590359
CMD dir  ... 00:00:00.6270627
GCI      ... 00:00:06.0796079

GetFiles is about 2x faster than CMD dir, itself 10x faster than GCI. GetFiles 大约比 CMD 目录快 2 倍,本身比 GCI 快 10 倍。

One iteration of scanning a distant server on another domain for application log files, with many files involved扫描另一个域上的远程服务器以查找应用程序日志文件的一次迭代,涉及许多文件

.\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.2011082*.*"
CMD dir  ... 00:00:00.3340334
GetFiles ... 00:00:00.4360436
GCI      ... 00:11:09.5525579

CMD dir is fastest going to the distant server with many files, but GetFiles is respectably close. CMD 目录可以最快地到达带有许多文件的远程服务器,但 GetFiles 相当接近。 GCI on the other hand is a couple of thousand times slower.另一方面,GCI 慢了几千倍。

Two iterations of scanning a distant server on another domain for application log files, with many files两次迭代扫描另一个域上的远程服务器以查找应用程序日志文件,其中包含许多文件

.\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.20110822*.*" 2
CMD dir  ... 00:00:00.9360240
GetFiles ... 00:00:01.4976384
GCI      ... 00:22:17.3068616

More or less linear increase as test iterations increase.随着测试迭代的增加或多或少线性增加。

One iteration of scanning a distant server on another domain for application log files, with fewer files扫描另一个域上的远程服务器以获取应用程序日志文件的一次迭代,文件更少

.\test.ps1 "\\distantserver.company.com\logs\othersubdir" "appname.2011082*.*" 10
GetFiles ... 00:00:00.5304170
CMD dir  ... 00:00:00.6240200
GCI      ... 00:00:01.9656630

Here GCI is not too bad, GetFiles is 3x faster, and CMD dir is close behind.这里 GCI 还不错,GetFiles 快了 3 倍,CMD 目录紧随其后。

Conclusion结论

GCI needs a -raw or -fast option that does not try to do so much. GCI需要一个-raw-fast选项,它不会尝试做太多事情。 In the meantime, GetFiles is a healthy alternative that is only occasionally a little slower than CMD dir , and usually faster (due to spawning CMD.exe?).与此同时, GetFiles是一个健康的替代方案,只是偶尔比CMD dir慢一点,通常更快(由于产生 CMD.exe?)。

For reference, here's the test.ps1 code.作为参考,这里是 test.ps1 代码。

param ( [string]$path, [string]$filemask, [switch]$recurse=$false, [int]$n=1 )
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
write-host "GetFiles... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
  if( $recurse ){ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
      [Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,$filemask
    )  | out-file ".\testfiles1.txt"}
  else{ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
      [Microsoft.VisualBasic.FileIO.SearchOption]::SearchTopLevelOnly,$filemask
    )  | out-file ".\testfiles1.txt" }}
$dt2=get-date;
write-host $dt2.subtract($dt)
write-host "CMD dir... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
  if($recurse){
    cmd /c "dir /a-d /b /s $path\$filemask" | out-file ".\testfiles2.txt"}
  else{ cmd /c "dir /a-d /b $path\$filemask" | out-file ".\testfiles2.txt"}}
$dt2=get-date;
write-host $dt2.subtract($dt)
write-host "GCI... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
  if( $recurse ) {
    get-childitem "$path\*" -include $filemask -recurse | out-file ".\testfiles0.txt"}
  else {get-childitem "$path\*" -include $filemask | out-file ".\testfiles0.txt"}}
$dt2=get-date;
write-host $dt2.subtract($dt)

Here is a good explanation on why Get-ChildItem is slow by Lee Holmes. 是为什么要-ChildItem是李福尔摩斯慢了很好的解释。 If you take note of the comment from "Anon 11 Mar 2010 11:11 AM" at the bottom of the page his solution might work for you.如果您注意到页面底部“Anon 11 Mar 2010 11:11 AM”的评论,他的解决方案可能适合您。

Anon's Code: Anon的代码:

# SCOPE: SEARCH A DIRECTORY FOR FILES (W/WILDCARDS IF NECESSARY)
# Usage:
# $directory = "\\SERVER\SHARE"
# $searchterms = "filname[*].ext"
# PS> $Results = Search $directory $searchterms

[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null

Function Search {
  # Parameters $Path and $SearchString
  param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
  [Parameter(Mandatory=$true)][string]$SearchString
  )
  try {
    #.NET FindInFiles Method to Look for file
    # BENEFITS : Possibly running as background job (haven't looked into it yet)

    [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
    $Path,
    [Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
    $SearchString
    )
  } catch { $_ }

}

I tried some of the suggested methods with a large amount of files (~190.000).我在大量文件(~190.000)中尝试了一些建议的方法。 As mentioned in Kyle's comment, GetFiles isn't very useful here, because it needs nearly forever.正如 Kyle 的评论中提到的, GetFiles在这里不是很有用,因为它几乎永远需要。

cmd dir was better than Get-ChildItems at my first tests, but it seems, GCI speeds up a lot if you use the -Force parameter.在我的第一次测试中,cmd dir 比Get-ChildItems更好,但似乎,如果您使用-Force参数,GCI 会加快很多速度。 With this the needed time was about the same as for cmd dir.这样,所需的时间与 cmd 目录大致相同。

PS: In my case I had to exclude most of the files because of their extension. PS:就我而言,由于扩展名,我不得不排除大多数文件。 This was made with -Exclude in gci and with a |这是用-Exclude中的-Exclude| where in the other commands.在其他命令中的位置。 So the results for just searching files might slightly differ.因此,仅搜索文件的结果可能略有不同。

Here's an interactive reader that parses cmd /c dir (which can handle unc paths), and will collect the 3 most important properties for most people: full path, size, timestamp这是一个交互式阅读器,它解析cmd /c dir (可以处理 unc 路径),并将收集对大多数人来说最重要的 3 个属性:完整路径、大小、时间戳

usage would be something like $files_with_details = $faster_get_files.GetFileList($unc_compatible_folder)用法类似于$files_with_details = $faster_get_files.GetFileList($unc_compatible_folder)

and there's a helper function to check combined size $faster_get_files.GetSize($files_with_details)并且有一个辅助函数来检查组合大小$faster_get_files.GetSize($files_with_details)

$faster_get_files = New-Module -AsCustomObject -ScriptBlock {
    #$DebugPreference = 'Continue' #verbose, this will take figuratively forever
    #$DebugPreference = 'SilentlyContinue'
    $directory_filter = "Directory of (.+)"
    $file_filter = "(\d+/\d+/\d+)\s+(\d+:\d+ \w{2})\s+([\d,]+)\s+(.+)" # [1] is day, [2] is time (AM/PM), [3] is size,  [4] is filename
    $extension_filter = "(.+)[\.](\w{3,4})" # [1] is leaf, [2] is extension
    $directory = ""
    function GetFileList ($directory = $this.directory) {
        if ([System.IO.Directory]::Exists($directory)) {
            # Gather raw file list
            write-Information "Gathering files..."
            $files_raw = cmd /c dir $directory \*.* /s/a-d

            # Parse file list
            Write-Information "Parsing file list..."
            $files_with_details = foreach ($line in $files_raw) {
                Write-Debug "starting line {$($line)}"
                Switch -regex ($line) {
                    $this.directory_filter{
                        $directory = $matches[1]
                        break
                    }
                    $this.file_filter {
                        Write-Debug "parsing matches {$($matches.value -join ";")}"
                        $date     = $matches[1]
                        $time     = $matches[2] # am/pm style
                        $size     = $matches[3]
                        $filename = $matches[4]

                        # we do a second match here so as to not append a fake period to files without an extension, otherwise we could do a single match up above
                        Write-Debug "parsing extension from {$($filename)}"
                        if ($filename -match $this.extension_filter) {
                            $file_leaf = $matches[1]
                            $file_extension = $matches[2]
                        } else {
                            $file_leaf = $filename
                            $file_extension = ""
                        }
                        [pscustomobject][ordered]@{
                            "fullname"  = [string]"$($directory)\$($filename)"
                            "filename"  = [string]$filename
                            "folder"    = [string]$directory
                            "file_leaf" = [string]$file_leaf
                            "extension" = [string]$file_extension
                            "date"      = get-date "$($date) $($time)"
                            "size"      = [int]$size
                        }
                        break
                    } 
                } # finish directory/file test
            } # finish all files
            return $files_with_details
        } #finish directory exists test
        else #directory doesn't exist {throw("Directory not found")}
    }
    function GetSize($files_with_details) {
        $combined_size = ($files_with_details|measure -Property size -sum).sum
        $pretty_size_gb = "$([math]::Round($combined_size / 1GB, 4)) GB"
        return $pretty_size_gb
    }
    Export-ModuleMember -Function * -Variable *
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM