[英]Can I optimize this PowerShell parser
我編寫了一個PowerShell腳本來讀取分隔的文件並逐行遍歷它們。 該腳本將屬性值保存到變量中,並在遇到字符串結尾時將這些值寫入文件中。
假設沒有安裝.Net框架,是否有辦法優化此腳本以提高速度?
rm storage.txt
$job_counter = 0;
$att_counter = 0;
foreach ($line in Get-Content .\a.txt) {
if ($line -match '^end$') {
$job_counter++;
}
}
echo "File has $job_counter jobs"
$job_counter = 0;
foreach ($line in Get-Content .\a.txt) {
if ($line -notmatch '^end$') {
$line_header = ($line.Split(":")[0])
$line_value = ($line.Split(":")[1])
switch ($line_header) {
insert_job {$insert_job = $line_value.trim();break}
job_type {$job_type = $line_value.trim();break}
command {$command = $line_value.trim();break}
machine {$machine = $line_value.trim();break}
owner {$owner = $line_value.trim();break}
permission {$permission = $line_value.trim();break}
date_conditions {$date_conditions = $line_value.trim();break}
days_of_week {$days_of_week = $line_value.trim();break}
start_times {$start_times = $line_value.trim();break}
description {$description = $line_value.trim();break}
std_out_file {$std_out_file = $line_value.trim();break}
std_err_file {$std_err_file = $line_value.trim();break}
alarm_if_fail {$alarm_if_fail = $line_value.trim();break}
end {$end = $line_value.trim();break}
box_name {$box_name = $line_value.trim();break}
condition {$condition = $line_value.trim();break}
run_window {$run_window = $line_value.trim();break}
n_retrys {$n_retrys = $line_value.trim();break}
term_run_time {$term_run_time = $line_value.trim();break}
box_terminator {$box_terminator = $line_value.trim();break}
job_terminator {$job_terminator = $line_value.trim();break}
min_run_alarm {$min_run_alarm = $line_value.trim();break}
max_run_alarm {$max_run_alarm = $line_value.trim();break}
profile {$profile_name = $line_value.trim();break}
}
$att_counter++;
} else {
$job_counter++
echo "encountered job number $job_counter, it has $att_counter attributes"
echo "'$insert_job','$job_type','$command','$machine','$owner','$permission','$date_conditions','$days_of_week','$start_times','$description','$std_out_file','$std_err_file','$alarm_if_fail','$end','$box_name','$condition','$run_window','$n_retrys','$term_run_time','$box_terminator','$job_terminator','$min_run_alarm','$max_run_alarm','$profile_name'" >>storage.txt
Clear-Variable -Name "insert_job";
Clear-Variable -Name "job_type";
Clear-Variable -Name "command";
Clear-Variable -Name "machine";
Clear-Variable -Name "owner";
Clear-Variable -Name "permission";
Clear-Variable -Name "date_conditions";
Clear-Variable -Name "days_of_week";
Clear-Variable -Name "start_times";
Clear-Variable -Name "description";
Clear-Variable -Name "std_out_file";
Clear-Variable -Name "std_err_file";
Clear-Variable -Name "alarm_if_fail";
Clear-Variable -Name "end";
Clear-Variable -Name "box_name";
Clear-Variable -Name "condition";
Clear-Variable -Name "run_window";
Clear-Variable -Name "n_retrys";
Clear-Variable -Name "term_run_time";
Clear-Variable -Name "box_terminator";
Clear-Variable -Name "job_terminator";
Clear-Variable -Name "min_run_alarm";
Clear-Variable -Name "max_run_alarm";
Clear-Variable -Name "profile_name";
$att_counter = 0;
}
}
Get-Content .\\a.txt
速度很慢,用[system.io.file]::ReadAllLines('c:\\full\\path\\to\\file\\a.txt')
替換它會快很多。
擺脫整個第一個循環,不要回顯根本有多少工作。 如果必須這樣做,則將循環推入堆棧並使用$jobCount = ($LinesLoadedOnce -match '^end$').Count
並使用-match
進行循環/過濾而不是foreach
。
與其每行文本文件多次調用Clear-Variable
並導致多次啟動cmdlet的開銷, Clear-Variable -Name "insert_job", "job_type", "command", ..
調用一次並將其傳遞給要清除的名稱數組,例如Clear-Variable -Name "insert_job", "job_type", "command", ..
而不是使用>>storage.txt
來每行打開和關閉txt文件一次,而是將輸出收集到一個數組中,然后一次將其寫入具有set-content
的文件中
$results = foreach ($line in [system.io.file]::ReadAllLines('c:\full\path\to\file\a.txt'))
{
#code here
"'$insert_job','$job_type','$command', .."
}
$results | Set-Content -Path storage.txt
其余部分更多地取決於文件的格式,文件的大小,是否要跳過某些行,但它可能會變成類似以下內容:
$headers = @(
'insert_job'
'job_type'
'command'
'machine'
'owner'
'permission'
'date_conditions'
'days_of_week'
'start_times'
'description'
'std_out_file'
'std_err_file'
'alarm_if_fail'
'end'
'box_name'
'condition'
'run_window'
'n_retrys'
'term_run_time'
'box_terminator'
'job_terminator'
'min_run_alarm'
'max_run_alarm'
'profile'
)
$headerRegex = "^($($headers -join '|'))\s*:\s*(.*?)\s*$"
$data = [ordered]@{}
foreach($h in $headers) { $data[$h] = $null }
$results = foreach ($line in [system.io.file]::ReadAllLines('c:\full\path\to\file\a.txt')) {
if ($line -match $headerRegex) {
$data[$matches[1]] = $matches[2]
}
elseif ($line -eq 'end') {
[PSCustomObject]$data
$data = [ordered]@{}
foreach($h in $headers) { $data[$h] = $null }
}
}
$results | Export-Csv storage.txt -NoTypeInformation
這樣可以將更多工作投入正則表達式引擎,減少字符串處理和字符串插值,使用更少的變量和更少的cmdlet,避免所有切換和中斷跳動,並且運行速度可能更快。
我沒有測試過,因為我不知道您文件的內容。
注釋和TessellatingHeckler的答案中都有很好的指針,但是讓我嘗試將所有這些與其他速度改進一起使用-請參見代碼中的注釋。
rm storage.txt
$job_counter = 0
$att_counter = 0
# Read all lines into memory up front, using the .NET framework directly
# which is much faster than using Get-Content.
$allLines = [IO.File]::ReadAllLines("$PWD/a.txt")
# Count the lines that contain 'end' exactly to determine
# the number of jobs.
$job_counter = ($allLines -eq 'end').Count
"File has $job_counter jobs"
$job_counter = 0
# Initialize the hashtable whose entries will contain the values
# (rather than individual variables).
$values = [ordered] @{}
# Use a `switch` statement to process the lines, which is generally
# faster than a `foreach` loop.
switch ($allLines) {
'end' { # Use string equality (not regex matching), which is faster.
$job_counter++
"encountered job number $job_counter, it has $att_counter attributes"
# Write the values to
"'$($values.insert_job)','$($values.job_type)','$($values.command)','$($values.machine)','$($values.owner)','$($values.permission)','$($values.date_conditions)','$($values.days_of_week)','$($values.start_times)','$($values.description)','$($values.std_out_file)','$($values.std_err_file)','$($values.alarm_if_fail)','$($values.end)','$($values.box_name)','$($values.condition)','$($values.run_window)','$($values.n_retrys)','$($values.term_run_time)','$($values.box_terminator)','$($values.job_terminator)','$($values.min_run_alarm)','$($values.max_run_alarm)','$($values.profile)'" >>storage.txt
# Clear the values from the hashtable.
$values.Clear()
}
default {
# Split the line at hand into field name and value...
$fieldName, $value = $_ -split ':', 2, 'SimpleMatch'
# ... and add an entry for the pair to the hashtable.
$values.$fieldName = $value.trim()
$att_counter++
}
}
注意:上面的解決方案無法驗證在輸入文件中找到的字段名稱。 為此,需要額外的工作。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.