简体   繁体   English

powershell 是如何懒洋洋地评价这个说法的?

[英]How does powershell lazily evaluate this statement?

I was searching for a way to to read only the first few lines of a csv file and came across this answer .我正在寻找一种仅读取 csv 文件的前几行的方法,并遇到了这个答案 The accepted answer suggests using接受的答案建议使用

Get-Content "C:\start.csv" | select -First 10 | Out-File "C:\stop.csv"

Another answers suggests using另一个答案建议使用

Get-Content C:\Temp\Test.csv -TotalCount 3

Because my csv is fairly large I went with the second option.因为我的 csv 相当大,所以我选择了第二个选项。 It worked fine.它工作得很好。 Out of curiosity I decided to try the first option assuming I could ctrl+c if it took forever.出于好奇,我决定尝试第一个选项,假设我可以ctrl+c如果它需要永远。 I was surprised to see that it returned just as quickly.我很惊讶地看到它回来得这么快。

Is it safe to use the first approach when working with large files?处理大文件时使用第一种方法是否安全? How does powershell achieve this? powershell 是如何实现这一点的?

Yes, Select-Object -First n is "safe" for large files (provided you want to read only a small number of lines, so pipeline overhead will be insignificant, else Get-Content -TotalCount n will be more efficient).是的, Select-Object -First n对于大文件是“安全的”(前提是您只想读取少量行,因此管道开销将是微不足道的,否则Get-Content -TotalCount n会更有效)。

It works like break in a loop, by exiting the pipeline early, when the given number of items have been processed.当给定数量的项目已被处理时,它的工作方式就像循环break一样,通过提前退出管道。 Internally it throws a special exception that the PowerShell pipeline machinery recognizes.在内部,它会引发 PowerShell 管道机器识别的特殊异常。


Here is a demonstration that "abuses" Select-Object to break from a ForEach-Object "loop", which is not possible using normal break statement.这是一个“滥用” Select-Object以从ForEach-Object “循环”中中断的演示,使用普通的break语句是不可能的。

1..10 | ForEach-Object {
   Write-Host $_             # goes directly to console, so is ignored by Select-Object
   if( $_ -ge 3 ) { $true }  # "break" by outputting one item
} | Select-Object -First 1 | Out-Null

Output: Output:

1
2
3

As you can see, Select-Object -First n actually breaks the pipeline instead of first reading all input and then selecting only the specified number of items.如您所见, Select-Object -First n实际上打破了管道,而不是首先读取所有输入,然后仅选择指定数量的项目。


Another, more common use case is when you want to find only a single item in the output of a pipeline.另一个更常见的用例是当您只想在管道的 output 中查找单个项目时。 Then it makes sense to exit from the pipeline as soon as you have found that item:找到该项目后立即退出管道是有意义的:

Get-ChildItem -Recurse | Where-Object { SomeCondition } | Select-Object -First 1

According to Microsoft the Get-Content cmdlet has a parameter called -ReadCount.根据Microsoft的说法,Get-Content cmdlet 有一个名为 -ReadCount 的参数。 Their documentation states他们的文件指出

Specifies how many lines of content are sent through the pipeline at a time.指定一次通过管道发送多少行内容。 The default value is 1. A value of 0 (zero) sends all of the content at one time.默认值为 1。值 0(零)一次发送所有内容。

This parameter does not change the content displayed, but it does affect the time it takes to display the content.该参数不会改变显示的内容,但会影响显示内容所需的时间。 As the value of ReadCount increases, the time it takes to return the first line increases, but the total time for the operation decreases.随着 ReadCount 值的增加,返回第一行所需的时间会增加,但操作的总时间会减少。 This can make a perceptible difference in large items.这可以在大项目中产生明显的差异。

Since -ReadCount defaults to 1 Get-Content effectively acts as a generator for reading a file line-by-line.由于-ReadCount默认为 1,因此 Get-Content 有效地充当了逐行读取文件的生成器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM