powershell 获取特定子串位置的总和

Question

How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:如果有以下条件，如何使用 powershell 从子字符串中获取文件的总和并将总和放在特定位置（不同的行）上：

Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S获取以字符 D 开头的行的位置 3 到 13 的数字总和。将总和放在以 S 开头的行的位置 10 到 14

So for example, if i have this file:例如，如果我有这个文件：

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.我想得到 38.95、18.95 和 18.95 的总和，然后将总和放在以 S 开头的行下方的 xxxxx 位置。

Answer 1

You could try:你可以试试：

-match to find the lines using regex-pattern -match使用 regex-pattern 查找行
The .NET string-method Substring() to extract the values from the "D"-lines .NET 字符串方法Substring()从“D”行中提取值
Measure-Object -Sum to calculate the sum Measure-Object -Sum计算总和
-replace to insert the value (searches using regex-pattern). -replace插入值（使用正则表达式模式搜索）。

Ex:例如：

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt

Answer 2

PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file ( -file ) and match lines by regular expressions ( -regex ) . PowerShell 的switch语句具有强大但鲜为人知的功能，允许您遍历文件 ( -file ) 的行并通过正则表达式 ( -regex ) 匹配行。

Not only is switch -file convenient , it is also much faster than using cmdlets in a pipeline (see bottom section). switch -file不仅方便，而且比在管道中使用 cmdlet快得多（参见底部）。

[double] $sum = 0

switch -regex -file file.txt {

  # Note: The string to the left of each script block below ({ ... }), 
  #       e.g., '^D', is the regex to match each line against.
  #       Inside the script blocks, $_ refers to the input line at hand.

  # Extract number, add to sum, output the line.
  '^D' { $sum += $_.Substring(2, 11); $_; continue }

  # Summary line: place sum at character position 10, with 0-padding
  # Note: `-replace ',', '.'` is only needed if your culture uses "," as the
  #       decimal mark.
  '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
  
  # All other lines: pass them through.
  default { $_ }

}

^Note:^注意：

continue in the script blocks short-circuits further matching for the line at hand;在脚本块中continue短路进一步匹配手头的线路； by contrast, if you used break , no further lines would be processed .相比之下，如果您使用break ，则不会处理更多行。
Based on a later comment, I'm assuming you want an 18-character 0 -left-padded number on the S line at character position 10 .根据后来的评论，我假设您希望在S行的字符位置10处有一个 18 个字符的0左填充数字。

With your sample file, the above yields:使用您的示例文件，上面产生：

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        000000000000076.85

Optional reading: Comparing the performance of `switch -file ...` to `Get-Content ... | ForEach-Object ...`可选阅读：比较`switch -file ...`和`Get-Content ... | ForEach-Object ...` `Get-Content ... | ForEach-Object ...`

Running the following test script:运行以下测试脚本：

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, 
  (Measure-Command { get-content $tmpFile | % { $_ }  }).TotalSeconds
  Remove-Item $tmpFile
}

yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):例如，在我的机器上产生以下时间（绝对数字并不重要，但它们的比率应该给你一个感觉）：

0.0578924   # switch -file
6.0417638   # Get-Content | ForEach-Object

That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.也就是说，基于管道的解决方案比switch -file解决方案慢大约 100 (!) 倍。

Digging deeper:深入挖掘：

Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative: Frode F.指出Get-Content在处理大文件时速度很慢 - 尽管它的便利性使其成为一种流行的选择 - 并提到直接使用 .NET Framework 作为替代方案：

Using [System.IO.File]::ReadAllLines() ;使用[System.IO.File]::ReadAllLines() ; however, given that it reads the entire file into memory, that is only an option with smallish files.然而，鉴于它将整个文件读入内存，这只是小文件的一种选择。
Using [System.IO.StreamReader] 's ReadLine() method in a loop.在循环中使用[System.IO.StreamReader]的ReadLine()方法。

However, use of the pipeline in itself , irrespective of the specific cmdlets used, introduces overhead.但是，无论使用何种特定的 cmdlet，使用管道本身都会带来开销。 When performance matters - but only then - you should avoid it.当性能很重要时 - 但只有这样 - 你应该避免它。

Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of collection operator .ForEach() requires PSv4+):这是一个更新的测试，其中包括使用 .NET Framework 方法的命令，无论是否使用管道（使用集合运算符 .ForEach .ForEach()需要 PSv4+）：

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
  (Measure-Command { 
    $sr = [IO.StreamReader] (Convert-Path $tmpFile)
    while(-not $sr.EndOfStream) { $sr.ReadLine() }
    $sr.Close() 
  }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
  (Measure-Command { Get-Content $tmpFile | % { $_ }  }).TotalSeconds
  
  Remove-Item $tmpFile
}

Sample results, from fastest to slowest:示例结果，从最快到最慢：

0.0571143  # switch -file
0.2035162  # [System.IO.StreamReader] in a loop
0.6756535  # [System.IO.File]::ReadAllText() with .ForEach() collection operator
1.5088355  # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
5.9815751  # (pipeline) Get-Content with ForEach-Object

switch -file is the fastest by a factor of around 3, followed by the .NET + loop solution; switch -file是最快的，大约是 3 倍，其次是 .NET + 循环解决方案； using .ForEach() adds another factor of 3. Simply introducing the pipeline ( ForEach-Object instead of .ForEach() ) adds another factor of 2;使用.ForEach()增加了另一个因子 3。简单地引入管道（ ForEach-Object而不是.ForEach() ）增加了另一个因子 2； finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 4.最后，使用带有Get-Content和ForEach-Object的管道会增加另一个因子 4。

powershell 获取特定子串位置的总和

问题描述

2 个解决方案

解决方案1
1 2018-02-21 19:30:36

解决方案2
1 已采纳 2018-02-21 20:51:10

Optional reading: Comparing the performance of `switch -file ...` to `Get-Content ... | ForEach-Object ...`可选阅读：比较`switch -file ...`和`Get-Content ... | ForEach-Object ...` `Get-Content ... | ForEach-Object ...`

powershell 获取特定子串位置的总和

问题描述

2 个解决方案

解决方案1 1 2018-02-21 19:30:36

解决方案2 1 已采纳 2018-02-21 20:51:10

Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...可选阅读：比较switch -file ...和Get-Content ... | ForEach-Object ... Get-Content ... | ForEach-Object ...

解决方案1
1 2018-02-21 19:30:36

解决方案2
1 已采纳 2018-02-21 20:51:10

Optional reading: Comparing the performance of `switch -file ...` to `Get-Content ... | ForEach-Object ...`可选阅读：比较`switch -file ...`和`Get-Content ... | ForEach-Object ...` `Get-Content ... | ForEach-Object ...`