简体   繁体   中英

How to improve the performance of Write-Progress?

I'm writing a script that takes an output file from another platform (that sadly doesn't produce CSV output, instead it's around 7 lines per record), grabbing the lines that have the values I'm interested in (using select-string ) and then scanning the MatchInfo array, extracting the exact text and building an array as I go, to export to CSV when finished.

My problem is that the original file has around 94000 lines of text, and the matchinfo object still has around 23500 records in it, so it takes a while, especially building the array, so I thought I'd throw in a Write-Progress but the overhead in doing so is quite horrific, it increases the elapsed time x8 vs not having the progress bar.

Here's an example entry from the original file:

CREATE   TRANCODE   MPF               OF TXOLID
AGENDA                   = T4XCLCSHINAG
,ANY_SC_LIST              = NONE    ,EVERY_SC_LIST            = NONE
,SECURITY_CATEGORY        = NONE    ,FUNCTION                 = 14
,TRANCODE_VALUE           = "MPF"
,TRANCODE_FUNCTION_MNEMONIC = NONE
,INSTALLATION_DATA        = NONE
;

Now, for each of these, I only care about the values of AGENDA and TRANCODE_VALUE , so having read the file in using Get-Content , I then use Select-String as the most efficient way I know to filter out the rest of the lines in the file:

rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")

Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")

This leaves me with an object of type Microsoft.PowerShell.Commands.MatchInfo with contents like:

AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MPF"
AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MP"

Now that Select-String only took around 9 seconds, so no real need for a progress bar there.

However, the next step, grabbing the actual values (after the = ) and putting in an array takes over 30 seconds, so I figured a Write-Progress is helpful to the user and at least shows that something is actually happening, but, the addition of the progress bar seriously extends the elapsed time, see the following output from Measure-Command :

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           <#$count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))#>
                        }}



TotalSeconds      : 32.7902523   

So that's 717.2308630680085 records/sec

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           $count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))
                        }}


TotalSeconds      : 261.3469632  

Now only a paltry 89.98660799693897 records/sec

Any ideas how to improve the efficiency?

Here's the full script as-is:

rv Start,Filtered,count,CSV  
Write-Host "Reading Mainframe Extract File"  
$Start = gc K:\TRANCODES.txt  
Write-Host ("Read Complete : " + $Start.Count + " records found")  

Write-Host "Filtering records for AGENDA/TRANCODE information"  
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"  
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")  

Write-Host "Building table from the filter results"  
[int]$count = 0  
$CSV = @()  
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');  
                                            $CSV += $obj;  
                                            $obj = $null}  
                           $count++     
                           Write-Progress `  
                           -Activity "Building table of values from filter results" `  
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `  
                           -Id 1 `  
                           -PercentComplete ([int]($count/$Filtered.Count *100))  
                        }  
     Write-Progress `  
     -Activity "Building table of values from filter results" `  
     -Status ("Table built : " + $CSV.Count + " rows created") `  
     -Id 1 `  
     -Completed  

Write-Host ("Table built : " + $CSV.Count + " rows created")  

Write-Host "Sorting and Exporting table to CSV file"  

$CSV|Select TRANCODE,AGENDA|Sort TRANCODE |Export-CSV -notype K:\TRANCODES.CSV  

Here's output from script with the write-progress commented out:

Reading Mainframe Extract File
Read Complete : 94082 records found
Filtering records for AGENDA/TRANCODE information
11759 AGENDA/TRANCODE pairs found
Building table from the filter results
Table built : 11759 rows created
Sorting and Exporting table to CSV file

TotalSeconds      : 75.2279182  

EDIT: I've adopted a modified version of the answer from @RomanKuzmin, so the appropriate code section now looks like:

Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
                                        $obj = New-Object System.Object;
                                        $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
                If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                                $CSV += $obj;
                                                $obj = $null}
                               $count++     
                               If ($sw.Elapsed.TotalMilliseconds -ge 500) {
                               Write-Progress `
                               -Activity "Building table of values from filter results" `
                               -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                               -Id 1 `
                               -PercentComplete ([int]($count/$Filtered.Count *100));
                               $sw.Reset();
                               $sw.Start()}
                            }
         Write-Progress `
         -Activity "Building table of values from filter results" `
         -Status ("Table built : " + $CSV.Count + " rows created") `
         -Id 1 `
         -Completed

And running the entire script through Measure-Command gives elapsed time of 75.2279182 seconds with no write-progress and with the modified write-progress using @RomanKuzmin suggestion, 76.525382 seconds - not bad at all!! :-)

In such cases when progress is called too often I use this approach

# fast even with Write-Progress
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for($e = 0; $e -lt 1mb; ++$e) {
    if ($sw.Elapsed.TotalMilliseconds -ge 500) {
        Write-Progress -Activity Test -Status "Done $e"
        $sw.Reset(); $sw.Start()
    }
}

# very slow due to Write-Progress
for($e = 0; $e -lt 1mb; ++$e) {
    Write-Progress -Activity Test -Status "Done $e"
}

Here is the suggestion on Connect... .

I hope this helps someone else. I spent a day on a similar problem: Progress bar was very very slow.

My problem however was rooted in the fact that I had made the screenbuffer for the powershell console extremely wide (9999 instead of the default 120).

This caused Write-Progress to be slowed to the extreme every time it had to update the gui progress bar.

I completely removed my old answer for sake of efficiency, Although modulus checks are efficient enough, they do take time, especially if doing a modulus 20 against say 5 million - this adds a decent amount of overhead.

For loops, all I do is something simple as follows

---which is similar to the stop watch method, is reset your progress check with each write-progress:

$totalDone=0
$finalCount = $objects.count
$progressUpdate = [math]::floor($finalCount / 100)
$progressCheck = $progressUpdate+1
foreach ($object in $objects) {
    <<do something with $object>>
    $totalDone+=1
    If ($progressCheck -gt $progressUpdate){
        write-progress -activity "$totalDone out of $finalCount completed" -PercentComplete $(($totalDone / $finalCount) * 100)
        $progressCheck = 0
    }
    $progressCheck += 1
}

The reason I set $progressCheck to $progressUpdate+1 is because it will run the first time through the loop.

This method will run a progress update every 1% of completion. If you want more or less, just update the division from 100 to your prefered number. 200 would mean an update every 0.5% and 50 would mean every 2%

I wanted to use write-progress to monitor the piping of get-child-item to file. The solution was to start a new job and then monitor the output of the job for change from another process. Powershell makes this quite easy.

 # start the job to write the file index to the cache
 $job = start-job {
     param($path)
     Get-ChildItem -Name -Attributes !D -Recurse $path > $path/.hscache
 } -arg $(pwd) 

 # Wake every 200 ms and print the progress to the screen until the job is finished
 while( $job.State -ne "Completed") {
        Write-Progress -Activity ".hscache-build " -Status $(get-childitem .hscache).length
        sleep -m 200
 }

# clear the progress bar
Write-Progress -Activity ".hscache-build" -Completed

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM