简体   繁体   中英

Optimizing a script

Info

I've created a script which analyzes the debug logs from Windows DNS Server.

It does the following:

  1. Open debug log using [System.IO.File] class
  2. Perform a regex match on each line
  3. Separate 16 capture groups into different properties inside a custom object
  4. Fills dictionaries and appends to the value of each key to produce statistics

Steps 1 and 2 take the longest. In fact, they take a seemingly endless amount of time, because the file is growing as it is being read.

Problem

Due to the size of the debug log (80,000kb) it takes a very long time.

I believe that my code is fine for smaller text files, but it fails to deal with much larger files.

Code

Here is my code: https://github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1

Debug log preview

This is what the debug looks like (including the blank lines)

Multiply this by about 100,000,000 and you have my debug log.

21/03/2014 2:20:03 PM 0D0C PACKET  0000000005FCB280 UDP Rcv 202.90.34.177   3709   Q [1001   D   NOERROR] A      (2)up(13)massrelevance(3)com(0)

21/03/2014 2:20:03 PM 0D0C PACKET  00000000042EB8B0 UDP Rcv 67.215.83.19    097f   Q [0000       NOERROR] CNAME  (15)manchesterunity(3)org(2)au(0)

21/03/2014 2:20:03 PM 0D0C PACKET  0000000003131170 UDP Rcv 62.36.4.166     a504   Q [0001   D   NOERROR] A      (3)ekt(4)user(7)net0319(3)com(0)

21/03/2014 2:20:03 PM 0D0C PACKET  00000000089F1FD0 UDP Rcv 80.10.201.71    3e08   Q [1000       NOERROR] A      (4)dns1(5)offis(3)com(2)au(0)

Request

I need ways or ideas on how to open and read each line of a file more quickly than what I am doing now.

I am open to suggestions of using a different language.

I would trade this:

$dnslog = [System.IO.File]::Open("c:\dns.log","Open","Read","ReadWrite")
$dnslog_content = New-Object System.IO.StreamReader($dnslog)


For ($i=0;$i -lt $dnslog.length; $i++)
{


    $line = $dnslog_content.readline()
    if ($line -eq $null) { continue }


    # REGEX MATCH EACH LINE OF LOGFILE
    $pattern = $line | select-string -pattern $regex



    # IGNORE EMPTY MATCH
    if ($pattern -eq $null) {
            continue
    }

for this:

Get-Content 'c:\dns.log' -ReadCount 1000 |
 ForEach-Object {
   foreach ($line in $_)
    {
      if ($line -match $regex)
       {
         #Process matches
       }
    }

That will reduce then number of file read operations by a factor of 1000.

Trading the select-string operation will require re-factoring the rest of the code to work with $matches[n] instead of $pattern.matches[0].groups[$n].value, but is much faster. Select-String returns matchinfo objects which contain a lot of additional information about the match (line number, filename, etc.) which is great if you need it. If all you need is strings from the captures then it's wasted effort.

You're creating an object ($log), and then accumulating values into array properties:

$log.date                += @($pattern.matches[0].groups[$n].value); $n++

that array addition is going to kill your performance. Also, hash table operations are faster than object property updates.

I'd create $log as a hash table first, and the key values as array lists:

$log = @{}
$log.date = New-Object collections.arraylist

Then inside your loop:

$log.date.add($matches[1]) > $nul)

Then create your object from $log after you've populated all of the array lists.

As a general piece of advise, use the Measure-Command to find out which script blocks take the longest time.

That being said, the sleep process seems a bit weird. If I'm not in error, you sleep 20 ms after each row:

sleep -milliseconds 20

Multiply 20 ms with the log size, 100 million iterations, and you'll get quite a long total sleep time.

Try sleeping after some decent batch size. Try if 10 000 rows is good like so,

if($i % 10000 -eq 0) {
    write-host -nonewline "."
    start-sleep -milliseconds 20
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM