[英]Need to make a PowerShell script faster
I taught my self Powershell so I do not know everything about it. 我教自己的Powershell所以我不知道它的一切。
I need to search a database with the exact amount of lines I have put in (the database is predefined), it contains > 11800 entries. 我需要使用我输入的确切行数(数据库是预定义的)搜索数据库,它包含> 11800个条目。
Can you please help me find what is making this slow? 你能帮我找一下这个慢的原因吗?
Code: 码:
$Dict = Get-Content "C:\Users\----\Desktop\Powershell Program\US.txt"
if($Right -ne "") {
$Comb = $Letter + $Right
$total = [int]0
$F = ""
do {
$F = $Dict | Select-Object -Index $total
if($F.Length -eq $Num) {
if($F.Chars("0") + $F.Chars("1") -eq $Comb) {
Add-Content "C:\Users\----\Desktop\Powershell Program\Results.txt" "$F"
}
}
$total++
Write-Host $total
} until([int]$total -gt [int]118619)
$total = [int]0
$F = ""
}
How do I speed this line by line searching/matching process up? 如何逐行搜索/匹配处理加快这一行? Do I do by multi-threading? 我是通过多线程做的吗? If so how? 如果是这样的话?
It seems like you've known at least one other language before powershell, and are starting out by basically replicating what you might have done in another language in this one. 看起来你在powershell之前至少已经知道了另一种语言,并且最初基本上复制了你在另一种语言中所做的事情。 That's a great way to learn a new language, but of course in the beginning you might end up with methods that are a bit strange or not performant. 这是学习新语言的好方法,但当然在开始时你最终可能会遇到一些有点奇怪或不具备性能的方法。
So first I want to break down what your code is actually doing, as a rough overview: 首先,我想分解您的代码实际执行的操作,作为一个粗略的概述:
$Dict
variable. 一次读取文件的每一行并将其存储在$Dict
变量中。 To understand why, you need to know a little bit about pipelines in PowerShell. 要了解原因,您需要了解PowerShell中的管道。 Cmdlets that accept and work on pipelines take one or more objects, but they process a single object at a time. 接受和处理管道的Cmdlet会占用一个或多个对象,但它们一次处理一个对象。 They don't even have access to the rest of the pipeline. 他们甚至无法访问管道的其余部分。
This is also true for the Select-Object
cmdlet. Select-Object
cmdlet也是如此。 So when you take an array with 18,500 objects in it, and pipe it into Select-Object -Index 18000
, you need to send in 17,999 objects for inspection/processing before it can give you the one you want. 因此,当您将一个包含18,500个对象的数组放入其中并将其输入Select-Object -Index 18000
,您需要发送Select-Object -Index 18000
对象进行检查/处理,然后才能为您提供所需的对象。 You can see how the time taken would get longer and longer the larger the index is. 您可以看到索引越大,所用时间越长越长。
Since you already have an array, you directly access any array member by index with square brackets []
like so: 由于你已经有了一个数组,你可以通过方括号[]
的索引直接访问任何数组成员,如下所示:
$Dict[18000]
For a given array, that takes the same amount of time no matter what the index is. 对于给定的数组,无论索引是什么,都需要相同的时间。
Now for a single call to Select-Object -Index
you probably aren't going to notice how long it takes, even with a very large index; 现在,对于Select-Object -Index
的单次调用,您可能不会注意到它需要多长时间,即使索引非常大; the problem is that you're looping through the entire array already, so this is compounding greatly. 问题是你已经在整个数组中循环,所以这很复杂。
You're essentially having to do the sum of 1..18000
which is about 你基本上不得不做1..18000
的总和 or approximately 162,000,000 iterations! 或大约162,000,000次迭代! (thanks to user2460798 for correcting my math) (感谢user2460798纠正我的数学)
I tested this. 我测试了这个。 First, I created an array with 19,000 objects: 首先,我创建了一个包含19,000个对象的数组:
$a = 1..19000 | %{"zzzz~$_"}
Then I measured both methods of accessing it. 然后我测量了两种访问它的方法。 First, with select -index
: 首先,使用select -index
:
measure-command { 1..19000 | % { $a | select -Index ($_-1 ) } | out-null }
Result: 结果:
TotalMinutes : 20.4383861316667
TotalMilliseconds : 1226303.1679
Then with the indexing operator ( []
): 然后使用索引运算符( []
):
measure-command { 1..19000 | % { $a[$_-1] } | out-null }
Result: 结果:
TotalMinutes : 0.00788774666666667
TotalMilliseconds : 473.2648
The results are pretty striking, it takes nearly 2,600 times longer to use Select-Object
. 结果非常引人注目,使用Select-Object
需要近2,600倍的时间 。
The above is the single thing causing your major slowdown, but I wanted to point out something else. 以上是导致您大幅放缓的唯一因素,但我想指出其他一些事情。
Typically in most languages, you would use a for
loop to count. 通常在大多数语言中,您将使用for
循环进行计数。 In PowerShell this would look like this: 在PowerShell中,这将是这样的:
for ($i = 0; $i -lt $total ; $i++) {
# $i has the value of the iteration
}
In short, there are three statements in the for
loop. 简而言之, for
循环中有三个语句。 The first is an expression that gets run before the loop starts. 第一个是在循环开始之前运行的表达式。 $i = 0
initializes the iterator to 0
, which is the typical usage of this first statement. $i = 0
将迭代器初始化为0
,这是第一个语句的典型用法。
Next is a conditional; 接下来是有条件的; this will be tested on each iteration and the loop will continue if it returns true. 这将在每次迭代时进行测试,如果返回true,循环将继续。 Here $i -lt $total
compares checks to see that $i
is less than the value of $total
, some other variable defined elsewhere, presumably the maximum value. 这里$i -lt $total
比较检查以查看$i
是否小于$total
的值,其他一些变量在其他地方定义,可能是最大值。
The last statement gets executed on each iteration of the loop. 最后一个语句在循环的每次迭代中执行。 $i++
is the same as $i = $i + 1
so in this case we're incrementing $i
on each iteration. $i++
与$i = $i + 1
相同,所以在这种情况下,我们在每次迭代时递增$i
。
It's a bit more concise than using a do
/ until
loop, and it's easier to follow because the meaning of a for
loop is well known. 它比使用do
/ until
循环更简洁,并且更容易理解,因为for
循环的含义是众所周知的。
If you're interested in more feedback about working code you've written, have a look at Code Review . 如果你有兴趣在有关工作你写代码的反馈,看看代码审查 。 Please read the rules there carefully before posting. 在发布之前请仔细阅读那里的规则。
To my surprise using the array GetEnumerator is faster than indexing. 令我惊讶的是,使用数组GetEnumerator比索引更快。 It takes about 5/8 of the time of indexing. 它需要大约5/8的索引时间。 However this test is pretty unrealistic, in that the body of each loop is about as small as it can be. 然而,这个测试是非常不现实的,因为每个循环的主体大约尽可能小。
$size = 64kb
$array = new int[] $size
# Initializing the array takes quite a bit of time compared to the loops below
0..($size-1) | % { $array[$_] = get-random}
write-host `n`nMeasure using indexing
[uint64]$sum = 0
Measure-Command {
for ($ndx = 0; $ndx -lt $size; $ndx++) {
$sum += $array[$ndx]
}
}
write-host Average = ($sum / $size)
write-host `n`nMeasure using array enumerator
[uint64]$sum = 0
Measure-Command {
foreach ($element in $array.GetEnumerator()) {
$sum += $element
}
}
write-host Average = ($sum / $size)
Measure using indexing
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 898
Ticks : 8987213
TotalDays : 1.04018668981481E-05
TotalHours : 0.000249644805555556
TotalMinutes : 0.0149786883333333
TotalSeconds : 0.8987213
TotalMilliseconds : 898.7213
Average = 1070386366.9346
Measure using array enumerator
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 559
Ticks : 5597112
TotalDays : 6.47813888888889E-06
TotalHours : 0.000155475333333333
TotalMinutes : 0.00932852
TotalSeconds : 0.5597112
TotalMilliseconds : 559.7112
Average = 1070386366.9346
Code for these two in assembler might look like 这两个汇编程序的代码可能看起来像
; Using Indexing
mov esi, <addr of array>
xor ebx, ebx
lea edi, <addr of $sum>
loop:
mov eax, dword ptr [esi][ebx*4]
add dword ptr [edi], eax
inc ebx
cmp ebx, 65536
jl loop
; Using enumerator
mov esi, <addr of array>
lea edx, [esi + 65356*4]
lea edi, <addr of $sum>
loop:
mov eax, dword ptr [esi]
add dword ptr [edi], eax
add esi, 4
cmp esi, edx
jl loop
The only difference is in the first mov
instruction in the loop, with one using an index register and the other not. 唯一的区别在于循环中的第一个mov
指令,一个使用索引寄存器而另一个不使用索引寄存器。 I kind of doubt that would explain the observed difference in speed. 我怀疑这可以解释观察到的速度差异。 I guess the JITter must add additional overhead. 我想JITter必须增加额外的开销。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.