我可以優化此PowerShell解析器嗎

Question

我編寫了一個PowerShell腳本來讀取分隔的文件並逐行遍歷它們。 該腳本將屬性值保存到變量中，並在遇到字符串結尾時將這些值寫入文件中。

假設沒有安裝.Net框架，是否有辦法優化此腳本以提高速度？

rm storage.txt
$job_counter = 0;
$att_counter = 0;

foreach ($line in Get-Content .\a.txt) {
    if ($line -match '^end$') {
        $job_counter++;
    }
}

echo "File has $job_counter jobs"

$job_counter = 0;

foreach ($line in Get-Content .\a.txt) {
    if ($line -notmatch '^end$') {
        $line_header = ($line.Split(":")[0])
        $line_value = ($line.Split(":")[1])
        switch ($line_header) {
            insert_job      {$insert_job      = $line_value.trim();break}
            job_type        {$job_type        = $line_value.trim();break}
            command         {$command         = $line_value.trim();break}
            machine         {$machine         = $line_value.trim();break}
            owner           {$owner           = $line_value.trim();break}
            permission      {$permission      = $line_value.trim();break}
            date_conditions {$date_conditions = $line_value.trim();break}
            days_of_week    {$days_of_week    = $line_value.trim();break}
            start_times     {$start_times     = $line_value.trim();break}
            description     {$description     = $line_value.trim();break}
            std_out_file    {$std_out_file    = $line_value.trim();break}
            std_err_file    {$std_err_file    = $line_value.trim();break}
            alarm_if_fail   {$alarm_if_fail   = $line_value.trim();break}
            end             {$end             = $line_value.trim();break}
            box_name        {$box_name        = $line_value.trim();break}
            condition       {$condition       = $line_value.trim();break}
            run_window      {$run_window      = $line_value.trim();break}
            n_retrys        {$n_retrys        = $line_value.trim();break}
            term_run_time   {$term_run_time   = $line_value.trim();break}
            box_terminator  {$box_terminator  = $line_value.trim();break}
            job_terminator  {$job_terminator  = $line_value.trim();break}
            min_run_alarm   {$min_run_alarm   = $line_value.trim();break}
            max_run_alarm   {$max_run_alarm   = $line_value.trim();break}
            profile         {$profile_name    = $line_value.trim();break}
        }
        $att_counter++;
    } else {
        $job_counter++
        echo "encountered job number $job_counter, it has $att_counter attributes"
        echo "'$insert_job','$job_type','$command','$machine','$owner','$permission','$date_conditions','$days_of_week','$start_times','$description','$std_out_file','$std_err_file','$alarm_if_fail','$end','$box_name','$condition','$run_window','$n_retrys','$term_run_time','$box_terminator','$job_terminator','$min_run_alarm','$max_run_alarm','$profile_name'" >>storage.txt

        Clear-Variable -Name "insert_job";
        Clear-Variable -Name "job_type";
        Clear-Variable -Name "command";
        Clear-Variable -Name "machine";
        Clear-Variable -Name "owner";
        Clear-Variable -Name "permission";
        Clear-Variable -Name "date_conditions";
        Clear-Variable -Name "days_of_week";
        Clear-Variable -Name "start_times";
        Clear-Variable -Name "description";
        Clear-Variable -Name "std_out_file";
        Clear-Variable -Name "std_err_file";
        Clear-Variable -Name "alarm_if_fail";
        Clear-Variable -Name "end";
        Clear-Variable -Name "box_name";
        Clear-Variable -Name "condition";
        Clear-Variable -Name "run_window";
        Clear-Variable -Name "n_retrys";
        Clear-Variable -Name "term_run_time";
        Clear-Variable -Name "box_terminator";
        Clear-Variable -Name "job_terminator";
        Clear-Variable -Name "min_run_alarm";
        Clear-Variable -Name "max_run_alarm";
        Clear-Variable -Name "profile_name";

        $att_counter = 0;
    }
}

Answer 1

Get-Content .\\a.txt速度很慢，用[system.io.file]::ReadAllLines('c:\\full\\path\\to\\file\\a.txt')替換它會快很多。

擺脫整個第一個循環，不要回顯根本有多少工作。 如果必須這樣做，則將循環推入堆棧並使用$jobCount = ($LinesLoadedOnce -match '^end$').Count並使用-match進行循環/過濾而不是foreach 。

與其每行文本文件多次調用Clear-Variable並導致多次啟動cmdlet的開銷， Clear-Variable -Name "insert_job", "job_type", "command", ..調用一次並將其傳遞給要清除的名稱數組，例如Clear-Variable -Name "insert_job", "job_type", "command", ..

而不是使用>>storage.txt來每行打開和關閉txt文件一次，而是將輸出收集到一個數組中，然后一次將其寫入具有set-content的文件中

$results = foreach ($line in [system.io.file]::ReadAllLines('c:\full\path\to\file\a.txt'))
{
    #code here

    "'$insert_job','$job_type','$command', .."

}

$results | Set-Content -Path storage.txt

其余部分更多地取決於文件的格式，文件的大小，是否要跳過某些行，但它可能會變成類似以下內容：

$headers = @(
    'insert_job'
    'job_type'
    'command'
    'machine'
    'owner'
    'permission'
    'date_conditions'
    'days_of_week'
    'start_times'
    'description'
    'std_out_file'
    'std_err_file'
    'alarm_if_fail'
    'end'
    'box_name'
    'condition'
    'run_window'
    'n_retrys'
    'term_run_time'
    'box_terminator'
    'job_terminator'
    'min_run_alarm'
    'max_run_alarm'
    'profile'
)

$headerRegex = "^($($headers -join '|'))\s*:\s*(.*?)\s*$"

$data = [ordered]@{}
foreach($h in $headers) { $data[$h] = $null }

$results = foreach ($line in [system.io.file]::ReadAllLines('c:\full\path\to\file\a.txt')) {

    if ($line -match $headerRegex) {
        $data[$matches[1]] = $matches[2]
    }
    elseif ($line -eq 'end') {
        [PSCustomObject]$data
        $data = [ordered]@{}
        foreach($h in $headers) { $data[$h] = $null }
    }
}

$results | Export-Csv storage.txt -NoTypeInformation

這樣可以將更多工作投入正則表達式引擎，減少字符串處理和字符串插值，使用更少的變量和更少的cmdlet，避免所有切換和中斷跳動，並且運行速度可能更快。

我沒有測試過，因為我不知道您文件的內容。

Answer 2

注釋和TessellatingHeckler的答案中都有很好的指針，但是讓我嘗試將所有這些與其他速度改進一起使用-請參見代碼中的注釋。

rm storage.txt
$job_counter = 0
$att_counter = 0

# Read all lines into memory up front, using the .NET framework directly
# which is much faster than using Get-Content.    
$allLines = [IO.File]::ReadAllLines("$PWD/a.txt")

# Count the lines that contain 'end' exactly to determine
# the number of jobs.
$job_counter = ($allLines -eq 'end').Count

"File has $job_counter jobs"

$job_counter = 0

# Initialize the hashtable whose entries will contain the values
# (rather than individual variables).
$values = [ordered] @{}

# Use a `switch` statement to process the lines, which is generally
# faster than a `foreach` loop.
switch ($allLines) {
  'end' {  # Use string equality (not regex matching), which is faster.
    $job_counter++
    "encountered job number $job_counter, it has $att_counter attributes"
    # Write the values to 
    "'$($values.insert_job)','$($values.job_type)','$($values.command)','$($values.machine)','$($values.owner)','$($values.permission)','$($values.date_conditions)','$($values.days_of_week)','$($values.start_times)','$($values.description)','$($values.std_out_file)','$($values.std_err_file)','$($values.alarm_if_fail)','$($values.end)','$($values.box_name)','$($values.condition)','$($values.run_window)','$($values.n_retrys)','$($values.term_run_time)','$($values.box_terminator)','$($values.job_terminator)','$($values.min_run_alarm)','$($values.max_run_alarm)','$($values.profile)'" >>storage.txt
    # Clear the values from the hashtable.
    $values.Clear()
  }
  default { 
    # Split the line at hand into field name and value...
    $fieldName, $value = $_ -split ':', 2, 'SimpleMatch'
    # ... and add an entry for the pair to the hashtable.
    $values.$fieldName = $value.trim()
    $att_counter++
  }
}

注意：上面的解決方案無法驗證在輸入文件中找到的字段名稱。 為此，需要額外的工作。

我可以優化此PowerShell解析器嗎

問題描述

2 個解決方案

解決方案1
0 2019-02-22 04:57:35

解決方案2
0 2019-02-22 20:14:36

我可以優化此PowerShell解析器嗎

問題描述

2 個解決方案

解決方案1 0 2019-02-22 04:57:35

解決方案2 0 2019-02-22 20:14:36

解決方案1
0 2019-02-22 04:57:35

解決方案2
0 2019-02-22 20:14:36