简体   繁体   English

需要更快地制作PowerShell脚本

[英]Need to make a PowerShell script faster

I taught my self Powershell so I do not know everything about it. 我教自己的Powershell所以我不知道它的一切。

I need to search a database with the exact amount of lines I have put in (the database is predefined), it contains > 11800 entries. 我需要使用我输入的确切行数(数据库是预定义的)搜索数据库,它包含> 11800个条目。

Can you please help me find what is making this slow? 你能帮我找一下这个慢的原因吗?

Code: 码:

$Dict = Get-Content "C:\Users\----\Desktop\Powershell Program\US.txt"

if($Right -ne "") {
    $Comb = $Letter + $Right
    $total = [int]0    
    $F = ""

    do {
        $F = $Dict | Select-Object -Index $total
        if($F.Length -eq $Num) {
            if($F.Chars("0") + $F.Chars("1") -eq $Comb) {
                Add-Content "C:\Users\----\Desktop\Powershell Program\Results.txt" "$F"
            }
        }
        $total++
        Write-Host $total
    } until([int]$total -gt [int]118619)

    $total = [int]0
    $F = ""
}

How do I speed this line by line searching/matching process up? 如何逐行搜索/匹配处理加快这一行? Do I do by multi-threading? 我是通过多线程做的吗? If so how? 如果是这样的话?

It seems like you've known at least one other language before powershell, and are starting out by basically replicating what you might have done in another language in this one. 看起来你在powershell之前至少已经知道了另一种语言,并且最初基本上复制了你在另一种语言中所做的事情。 That's a great way to learn a new language, but of course in the beginning you might end up with methods that are a bit strange or not performant. 这是学习新语言的好方法,但当然在开始时你最终可能会遇到一些有点奇怪或不具备性能的方法。

So first I want to break down what your code is actually doing, as a rough overview: 首先,我想分解您的代码实际执行的操作,作为一个粗略的概述:

  1. Read every line of the file at once and store it in the $Dict variable. 一次读取文件的每一行并将其存储在$Dict变量中。
  2. Loop the same number of times as there are lines. 循环次数与行数相同。
  3. In each iteration of the loop: 在循环的每次迭代中:
    1. Get the single line that matches the loop iteration (essentially through another iteration, rather than indexing, more on that later). 获取与循环迭代匹配的单行(主要通过另一次迭代,而不是索引,稍后再详述)。
    2. Get the first character of the line, then the second, then combine them. 获取该行的第一个字符,然后获取第二个字符,然后将它们组合起来。
    3. If that's equal to a pre-determined string, append this line to a text file. 如果它等于预定的字符串,请将此行附加到文本文件。

Step 3-1 is what's really slowing this down 步骤3-1是真正放慢这一点的原因

To understand why, you need to know a little bit about pipelines in PowerShell. 要了解原因,您需要了解PowerShell中的管道。 Cmdlets that accept and work on pipelines take one or more objects, but they process a single object at a time. 接受和处理管道的Cmdlet会占用一个或多个对象,但它们一次处理一个对象。 They don't even have access to the rest of the pipeline. 他们甚至无法访问管道的其余部分。

This is also true for the Select-Object cmdlet. Select-Object cmdlet也是如此。 So when you take an array with 18,500 objects in it, and pipe it into Select-Object -Index 18000 , you need to send in 17,999 objects for inspection/processing before it can give you the one you want. 因此,当您将一个包含18,500个对象的数组放入其中并将其输入Select-Object -Index 18000 ,您需要发送Select-Object -Index 18000对象进行检查/处理,然后才能为您提供所需的对象。 You can see how the time taken would get longer and longer the larger the index is. 您可以看到索引越大,所用时间越长越长。

Since you already have an array, you directly access any array member by index with square brackets [] like so: 由于你已经有了一个数组,你可以通过方括号[]的索引直接访问任何数组成员,如下所示:

$Dict[18000]

For a given array, that takes the same amount of time no matter what the index is. 对于给定的数组,无论索引是什么,都需要相同的时间。

Now for a single call to Select-Object -Index you probably aren't going to notice how long it takes, even with a very large index; 现在,对于Select-Object -Index的单次调用,您可能不会注意到它需要多长时间,即使索引非常大; the problem is that you're looping through the entire array already, so this is compounding greatly. 问题是你已经在整个数组中循环,所以这很复杂。

You're essentially having to do the sum of 1..18000 which is about 你基本上不得不做1..18000的总和 or approximately 162,000,000 iterations! 或大约162,000,000次迭代! (thanks to user2460798 for correcting my math) (感谢user2460798纠正我的数学)

Proof 证明

I tested this. 我测试了这个。 First, I created an array with 19,000 objects: 首先,我创建了一个包含19,000个对象的数组:

$a = 1..19000 | %{"zzzz~$_"}

Then I measured both methods of accessing it. 然后我测量了两种访问它的方法。 First, with select -index : 首先,使用select -index

measure-command { 1..19000 | % { $a | select -Index ($_-1 ) } | out-null }

Result: 结果:

TotalMinutes      : 20.4383861316667
TotalMilliseconds : 1226303.1679

Then with the indexing operator ( [] ): 然后使用索引运算符( [] ):

measure-command { 1..19000 | % { $a[$_-1] } | out-null }

Result: 结果:

TotalMinutes      : 0.00788774666666667
TotalMilliseconds : 473.2648

The results are pretty striking, it takes nearly 2,600 times longer to use Select-Object . 结果非常引人注目,使用Select-Object需要近2,600倍的时间

A counting loop 计数循环

The above is the single thing causing your major slowdown, but I wanted to point out something else. 以上是导致您大幅放缓的唯一因素,但我想指出其他一些事情。

Typically in most languages, you would use a for loop to count. 通常在大多数语言中,您将使用for循环进行计数。 In PowerShell this would look like this: 在PowerShell中,这将是这样的:

for ($i = 0; $i -lt $total ; $i++) {
    # $i has the value of the iteration
}

In short, there are three statements in the for loop. 简而言之, for循环中有三个语句。 The first is an expression that gets run before the loop starts. 第一个是在循环开始之前运行的表达式。 $i = 0 initializes the iterator to 0 , which is the typical usage of this first statement. $i = 0将迭代器初始化为0 ,这是第一个语句的典型用法。

Next is a conditional; 接下来是有条件的; this will be tested on each iteration and the loop will continue if it returns true. 这将在每次迭代时进行测试,如果返回true,循环将继续。 Here $i -lt $total compares checks to see that $i is less than the value of $total , some other variable defined elsewhere, presumably the maximum value. 这里$i -lt $total比较检查以查看$i是否小于$total的值,其他一些变量在其他地方定义,可能是最大值。

The last statement gets executed on each iteration of the loop. 最后一个语句在循环的每次迭代中执行。 $i++ is the same as $i = $i + 1 so in this case we're incrementing $i on each iteration. $i++$i = $i + 1相同,所以在这种情况下,我们在每次迭代时递增$i

It's a bit more concise than using a do / until loop, and it's easier to follow because the meaning of a for loop is well known. 它比使用do / until循环更简洁,并且更容易理解,因为for循环的含义是众所周知的。

Other Notes 其他说明

If you're interested in more feedback about working code you've written, have a look at Code Review . 如果你有兴趣在有关工作你写代码的反馈,看看代码审查 Please read the rules there carefully before posting. 在发布之前请仔细阅读那里的规则。

To my surprise using the array GetEnumerator is faster than indexing. 令我惊讶的是,使用数组GetEnumerator比索引更快。 It takes about 5/8 of the time of indexing. 它需要大约5/8的索引时间。 However this test is pretty unrealistic, in that the body of each loop is about as small as it can be. 然而,这个测试是非常不现实的,因为每个循环的主体大约尽可能小。

$size = 64kb

$array = new int[] $size
# Initializing the array takes quite a bit of time compared to the loops below
0..($size-1) | % { $array[$_] = get-random}

write-host `n`nMeasure using indexing
[uint64]$sum = 0
Measure-Command {
  for ($ndx = 0; $ndx -lt $size; $ndx++) {
    $sum += $array[$ndx]
  }
}
write-host Average = ($sum / $size)

write-host `n`nMeasure using array enumerator
[uint64]$sum = 0
Measure-Command {
  foreach ($element in $array.GetEnumerator()) {
    $sum += $element
  }
}
write-host Average = ($sum / $size)



Measure using indexing


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 898
Ticks             : 8987213
TotalDays         : 1.04018668981481E-05
TotalHours        : 0.000249644805555556
TotalMinutes      : 0.0149786883333333
TotalSeconds      : 0.8987213
TotalMilliseconds : 898.7213

Average = 1070386366.9346


Measure using array enumerator
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 559
Ticks             : 5597112
TotalDays         : 6.47813888888889E-06
TotalHours        : 0.000155475333333333
TotalMinutes      : 0.00932852
TotalSeconds      : 0.5597112
TotalMilliseconds : 559.7112

Average = 1070386366.9346

Code for these two in assembler might look like 这两个汇编程序的代码可能看起来像

;       Using Indexing
mov     esi, <addr of array>
xor     ebx, ebx
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi][ebx*4]
add     dword ptr [edi], eax
inc     ebx
cmp     ebx, 65536
jl      loop

;       Using enumerator
mov     esi, <addr of array>
lea     edx, [esi + 65356*4]
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi]
add     dword ptr [edi], eax
add     esi, 4
cmp     esi, edx
jl      loop

The only difference is in the first mov instruction in the loop, with one using an index register and the other not. 唯一的区别在于循环中的第一个mov指令,一个使用索引寄存器而另一个不使用索引寄存器。 I kind of doubt that would explain the observed difference in speed. 我怀疑这可以解释观察到的速度差异。 I guess the JITter must add additional overhead. 我想JITter必须增加额外的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM