[英]How to improve the performance of Write-Progress?
我正在编写一个脚本,它从另一个平台获取输出文件(遗憾的是它不产生 CSV 输出,而是每条记录大约 7 行),抓取具有我感兴趣的值的行(使用select-string
),然后扫描MatchInfo
数组,提取准确的文本并构建一个数组,完成后导出为 CSV。
我的问题是原始文件有大约 94000 行文本,而 matchinfo 对象中仍然有大约 23500 条记录,所以需要一段时间,尤其是构建数组,所以我想我会抛出一个Write-Progress
但是这样做的开销非常可怕,它增加了经过时间 x8 而不是没有进度条。
这是原始文件中的示例条目:
CREATE TRANCODE MPF OF TXOLID
AGENDA = T4XCLCSHINAG
,ANY_SC_LIST = NONE ,EVERY_SC_LIST = NONE
,SECURITY_CATEGORY = NONE ,FUNCTION = 14
,TRANCODE_VALUE = "MPF"
,TRANCODE_FUNCTION_MNEMONIC = NONE
,INSTALLATION_DATA = NONE
;
现在,对于这些,我只关心的值AGENDA
和TRANCODE_VALUE
,所以看了文件中使用Get-Content
,然后我用Select-String
作为最有效的方法,我知道滤除线的其余部分在文件中:
rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")
Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")
这给我留下了一个Microsoft.PowerShell.Commands.MatchInfo
类型的对象,内容如下:
AGENDA = T4XCLCSHINAG
,TRANCODE_VALUE = "MPF"
AGENDA = T4XCLCSHINAG
,TRANCODE_VALUE = "MP"
现在 Select-String 只花了大约 9 秒,所以真的不需要进度条了。
然而,下一步,获取实际值(在=
)并放入一个数组需要超过 30 秒,所以我认为Write-Progress
对用户有帮助,至少表明某些事情正在发生,但是,添加进度条会严重延长经过时间,请参阅Measure-Command
的以下输出:
Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
<#$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))#>
}}
TotalSeconds : 32.7902523
所以这是 717.2308630680085 条记录/秒
Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))
}}
TotalSeconds : 261.3469632
现在只有微不足道的 89.98660799693897 条记录/秒
任何想法如何提高效率?
这是完整的脚本:
rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")
Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")
Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100))
}
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Table built : " + $CSV.Count + " rows created") `
-Id 1 `
-Completed
Write-Host ("Table built : " + $CSV.Count + " rows created")
Write-Host "Sorting and Exporting table to CSV file"
$CSV|Select TRANCODE,AGENDA|Sort TRANCODE |Export-CSV -notype K:\TRANCODES.CSV
以下是注释掉write-progress
脚本输出:
Reading Mainframe Extract File
Read Complete : 94082 records found
Filtering records for AGENDA/TRANCODE information
11759 AGENDA/TRANCODE pairs found
Building table from the filter results
Table built : 11759 rows created
Sorting and Exporting table to CSV file
TotalSeconds : 75.2279182
编辑:我采用了@RomanKuzmin 答案的修改版本,因此相应的代码部分现在如下所示:
Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
$obj = New-Object System.Object;
$obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
$CSV += $obj;
$obj = $null}
$count++
If ($sw.Elapsed.TotalMilliseconds -ge 500) {
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
-Id 1 `
-PercentComplete ([int]($count/$Filtered.Count *100));
$sw.Reset();
$sw.Start()}
}
Write-Progress `
-Activity "Building table of values from filter results" `
-Status ("Table built : " + $CSV.Count + " rows created") `
-Id 1 `
-Completed
并通过Measure-Command
运行整个脚本给出 75.2279182 秒的耗用时间,没有write-progress
,使用@RomanKuzmin 建议修改write-progress
,76.525382 秒 - 一点也不差!! :-)
在这种经常调用进度的情况下,我使用这种方法
# fast even with Write-Progress
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for($e = 0; $e -lt 1mb; ++$e) {
if ($sw.Elapsed.TotalMilliseconds -ge 500) {
Write-Progress -Activity Test -Status "Done $e"
$sw.Reset(); $sw.Start()
}
}
# very slow due to Write-Progress
for($e = 0; $e -lt 1mb; ++$e) {
Write-Progress -Activity Test -Status "Done $e"
}
这是关于Connect...的建议。
我希望这对其他人有帮助。 我在一个类似的问题上花了一天时间:进度条非常非常慢。
然而,我的问题的根源在于我为 powershell 控制台设置了非常宽的屏幕缓冲区(9999 而不是默认的 120)。
这导致 Write-Progress 每次必须更新 gui 进度条时都会减慢到极致。
为了效率,我完全删除了我的旧答案,尽管模数检查足够有效,但它们确实需要时间,特别是如果对模数 20 对 500 万进行模数 - 这会增加相当多的开销。
对于循环,我所做的只是一些简单的事情,如下所示
---类似于秒表方法,每次写入进度都会重置进度检查:
$totalDone=0
$finalCount = $objects.count
$progressUpdate = [math]::floor($finalCount / 100)
$progressCheck = $progressUpdate+1
foreach ($object in $objects) {
<<do something with $object>>
$totalDone+=1
If ($progressCheck -gt $progressUpdate){
write-progress -activity "$totalDone out of $finalCount completed" -PercentComplete $(($totalDone / $finalCount) * 100)
$progressCheck = 0
}
$progressCheck += 1
}
我将$progressCheck
设置$progressCheck
$progressUpdate+1
的原因是因为它会在循环中第一次运行。
此方法将每完成 1% 运行一次进度更新。 如果您想要更多或更少,只需将除法从 100 更新为您喜欢的数字。 200 表示每 0.5% 更新一次,50 表示每 2% 更新一次
我想使用 write-progress 来监视 get-child-item 到文件的管道。 解决方案是启动一个新作业,然后监视作业的输出是否来自另一个进程的更改。 Powershell 使这变得非常容易。
# start the job to write the file index to the cache
$job = start-job {
param($path)
Get-ChildItem -Name -Attributes !D -Recurse $path > $path/.hscache
} -arg $(pwd)
# Wake every 200 ms and print the progress to the screen until the job is finished
while( $job.State -ne "Completed") {
Write-Progress -Activity ".hscache-build " -Status $(get-childitem .hscache).length
sleep -m 200
}
# clear the progress bar
Write-Progress -Activity ".hscache-build" -Completed
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.