[英]How does powershell lazily evaluate this statement?
I was searching for a way to to read only the first few lines of a csv file and came across this answer .我正在寻找一种仅读取 csv 文件的前几行的方法,并遇到了这个答案。 The accepted answer suggests using
接受的答案建议使用
Get-Content "C:\start.csv" | select -First 10 | Out-File "C:\stop.csv"
Another answers suggests using另一个答案建议使用
Get-Content C:\Temp\Test.csv -TotalCount 3
Because my csv is fairly large I went with the second option.因为我的 csv 相当大,所以我选择了第二个选项。 It worked fine.
它工作得很好。 Out of curiosity I decided to try the first option assuming I could
ctrl+c
if it took forever.出于好奇,我决定尝试第一个选项,假设我可以
ctrl+c
如果它需要永远。 I was surprised to see that it returned just as quickly.我很惊讶地看到它回来得这么快。
Is it safe to use the first approach when working with large files?处理大文件时使用第一种方法是否安全? How does powershell achieve this?
powershell 是如何实现这一点的?
Yes, Select-Object -First n
is "safe" for large files (provided you want to read only a small number of lines, so pipeline overhead will be insignificant, else Get-Content -TotalCount n
will be more efficient).是的,
Select-Object -First n
对于大文件是“安全的”(前提是您只想读取少量行,因此管道开销将是微不足道的,否则Get-Content -TotalCount n
会更有效)。
It works like break
in a loop, by exiting the pipeline early, when the given number of items have been processed.当给定数量的项目已被处理时,它的工作方式就像循环
break
一样,通过提前退出管道。 Internally it throws a special exception that the PowerShell pipeline machinery recognizes.在内部,它会引发 PowerShell 管道机器识别的特殊异常。
Here is a demonstration that "abuses" Select-Object
to break from a ForEach-Object
"loop", which is not possible using normal break
statement.这是一个“滥用”
Select-Object
以从ForEach-Object
“循环”中中断的演示,使用普通的break
语句是不可能的。
1..10 | ForEach-Object {
Write-Host $_ # goes directly to console, so is ignored by Select-Object
if( $_ -ge 3 ) { $true } # "break" by outputting one item
} | Select-Object -First 1 | Out-Null
Output: Output:
1
2
3
As you can see, Select-Object -First n
actually breaks the pipeline instead of first reading all input and then selecting only the specified number of items.如您所见,
Select-Object -First n
实际上打破了管道,而不是首先读取所有输入,然后仅选择指定数量的项目。
Another, more common use case is when you want to find only a single item in the output of a pipeline.另一个更常见的用例是当您只想在管道的 output 中查找单个项目时。 Then it makes sense to exit from the pipeline as soon as you have found that item:
找到该项目后立即退出管道是有意义的:
Get-ChildItem -Recurse | Where-Object { SomeCondition } | Select-Object -First 1
According to Microsoft the Get-Content cmdlet has a parameter called -ReadCount.根据Microsoft的说法,Get-Content cmdlet 有一个名为 -ReadCount 的参数。 Their documentation states
他们的文件指出
Specifies how many lines of content are sent through the pipeline at a time.
指定一次通过管道发送多少行内容。 The default value is 1. A value of 0 (zero) sends all of the content at one time.
默认值为 1。值 0(零)一次发送所有内容。
This parameter does not change the content displayed, but it does affect the time it takes to display the content.
该参数不会改变显示的内容,但会影响显示内容所需的时间。 As the value of ReadCount increases, the time it takes to return the first line increases, but the total time for the operation decreases.
随着 ReadCount 值的增加,返回第一行所需的时间会增加,但操作的总时间会减少。 This can make a perceptible difference in large items.
这可以在大项目中产生明显的差异。
Since -ReadCount
defaults to 1 Get-Content effectively acts as a generator for reading a file line-by-line.由于
-ReadCount
默认为 1,因此 Get-Content 有效地充当了逐行读取文件的生成器。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.