[英]Optimizing a script
I've created a script which analyzes the debug logs from Windows DNS Server. 我创建了一个脚本,用于分析Windows DNS服务器中的调试日志。
It does the following: 它执行以下操作:
[System.IO.File]
class 使用[System.IO.File]
类打开调试日志 Steps 1 and 2 take the longest. 步骤1和2耗时最长。 In fact, they take a seemingly endless amount of time, because the file is growing as it is being read. 实际上,它们花费了看似无休止的时间,因为文件在读取时会不断增长。
Due to the size of the debug log (80,000kb) it takes a very long time. 由于调试日志的大小(80,000kb),因此需要很长时间。
I believe that my code is fine for smaller text files, but it fails to deal with much larger files. 我相信我的代码适用于较小的文本文件,但无法处理较大的文件。
Here is my code: https://github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1 这是我的代码: https : //github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1
This is what the debug looks like (including the blank lines) 这就是调试的样子(包括空行)
Multiply this by about 100,000,000 and you have my debug log. 乘以大约100,000,000 ,您便得到了我的调试日志。
21/03/2014 2:20:03 PM 0D0C PACKET 0000000005FCB280 UDP Rcv 202.90.34.177 3709 Q [1001 D NOERROR] A (2)up(13)massrelevance(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000042EB8B0 UDP Rcv 67.215.83.19 097f Q [0000 NOERROR] CNAME (15)manchesterunity(3)org(2)au(0)
21/03/2014 2:20:03 PM 0D0C PACKET 0000000003131170 UDP Rcv 62.36.4.166 a504 Q [0001 D NOERROR] A (3)ekt(4)user(7)net0319(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000089F1FD0 UDP Rcv 80.10.201.71 3e08 Q [1000 NOERROR] A (4)dns1(5)offis(3)com(2)au(0)
I need ways or ideas on how to open and read each line of a file more quickly than what I am doing now. 与现在相比,我需要一些方法或想法来更快地打开和读取文件的每一行。
I am open to suggestions of using a different language. 我愿意接受使用其他语言的建议。
I would trade this: 我会交易这个:
$dnslog = [System.IO.File]::Open("c:\dns.log","Open","Read","ReadWrite")
$dnslog_content = New-Object System.IO.StreamReader($dnslog)
For ($i=0;$i -lt $dnslog.length; $i++)
{
$line = $dnslog_content.readline()
if ($line -eq $null) { continue }
# REGEX MATCH EACH LINE OF LOGFILE
$pattern = $line | select-string -pattern $regex
# IGNORE EMPTY MATCH
if ($pattern -eq $null) {
continue
}
for this: 为了这:
Get-Content 'c:\dns.log' -ReadCount 1000 |
ForEach-Object {
foreach ($line in $_)
{
if ($line -match $regex)
{
#Process matches
}
}
That will reduce then number of file read operations by a factor of 1000. 这样一来,文件读取操作的数量将减少1000倍。
Trading the select-string operation will require re-factoring the rest of the code to work with $matches[n] instead of $pattern.matches[0].groups[$n].value, but is much faster. 交易选择字符串操作将需要重构其余代码以使用$ matches [n]而不是$ pattern.matches [0] .groups [$ n] .value,但速度要快得多。 Select-String returns matchinfo objects which contain a lot of additional information about the match (line number, filename, etc.) which is great if you need it. Select-String返回matchinfo对象,该对象包含有关匹配的许多其他信息(行号,文件名等),如果需要,这是很好的选择。 If all you need is strings from the captures then it's wasted effort. 如果您需要的只是捕获中的字符串,那么这是浪费时间。
You're creating an object ($log), and then accumulating values into array properties: 您正在创建一个对象($ log),然后将值累加到数组属性中:
$log.date += @($pattern.matches[0].groups[$n].value); $n++
that array addition is going to kill your performance. 阵列的增加会影响您的性能。 Also, hash table operations are faster than object property updates. 同样,哈希表操作比对象属性更新快。
I'd create $log as a hash table first, and the key values as array lists: 我首先将$ log创建为哈希表,并将键值创建为数组列表:
$log = @{}
$log.date = New-Object collections.arraylist
Then inside your loop: 然后在循环中:
$log.date.add($matches[1]) > $nul)
Then create your object from $log after you've populated all of the array lists. 填充所有数组列表后,然后从$ log创建对象。
As a general piece of advise, use the Measure-Command
to find out which script blocks take the longest time. 作为一般建议,请使用Measure-Command
来找出哪些脚本块花费的时间最长。
That being said, the sleep process seems a bit weird. 话虽如此,睡眠过程似乎有点不可思议。 If I'm not in error, you sleep 20 ms after each row: 如果我没有记错的话,那么每行之后您会睡20毫秒:
sleep -milliseconds 20
Multiply 20 ms with the log size, 100 million iterations, and you'll get quite a long total sleep time. 将日志大小乘以20 ms,进行1亿次迭代,您将获得相当长的总睡眠时间。
Try sleeping after some decent batch size. 尝试一些合适的批次大小后再睡觉。 Try if 10 000 rows is good like so, 尝试如果1万行是这样,
if($i % 10000 -eq 0) {
write-host -nonewline "."
start-sleep -milliseconds 20
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.