简体   繁体   English

powershell 获取特定子串位置的总和

[英]powershell get the sum of a specific substring position

How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:如果有以下条件,如何使用 powershell 从子字符串中获取文件的总和并将总和放在特定位置(不同的行)上:

Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S获取以字符 D 开头的行的位置 3 到 13 的数字总和。 将总和放在以 S 开头的行的位置 10 到 14

So for example, if i have this file:例如,如果我有这个文件:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.我想得到 38.95、18.95 和 18.95 的总和,然后将总和放在以 S 开头的行下方的 xxxxx 位置。

You could try:你可以试试:

  • -match to find the lines using regex-pattern -match使用 regex-pattern 查找行
  • The .NET string-method Substring() to extract the values from the "D"-lines .NET 字符串方法Substring()从“D”行中提取值
  • Measure-Object -Sum to calculate the sum Measure-Object -Sum计算总和
  • -replace to insert the value (searches using regex-pattern). -replace插入值(使用正则表达式模式搜索)。

Ex:例如:

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt

PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file ( -file ) and match lines by regular expressions ( -regex ) . PowerShell 的switch语句具有强大但鲜为人知的功能,允许您遍历文件 ( -file ) 的行并通过正则表达式 ( -regex ) 匹配行

Not only is switch -file convenient , it is also much faster than using cmdlets in a pipeline (see bottom section). switch -file不仅方便,而且比在管道中使用 cmdlet快得多(参见底部)。

[double] $sum = 0

switch -regex -file file.txt {

  # Note: The string to the left of each script block below ({ ... }), 
  #       e.g., '^D', is the regex to match each line against.
  #       Inside the script blocks, $_ refers to the input line at hand.

  # Extract number, add to sum, output the line.
  '^D' { $sum += $_.Substring(2, 11); $_; continue }

  # Summary line: place sum at character position 10, with 0-padding
  # Note: `-replace ',', '.'` is only needed if your culture uses "," as the
  #       decimal mark.
  '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
  
  # All other lines: pass them through.
  default { $_ }

}

Note:注意:

  • continue in the script blocks short-circuits further matching for the line at hand;在脚本块中continue短路进一步匹配手头的线路; by contrast, if you used break , no further lines would be processed .相比之下,如果您使用break ,则不会处理更多行
  • Based on a later comment, I'm assuming you want an 18-character 0 -left-padded number on the S line at character position 10 .根据后来的评论,我假设您希望在S行的字符位置10处有一个 18 个字符的0左填充数字。

With your sample file, the above yields:使用您的示例文件,上面产生:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        000000000000076.85

Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...可选阅读:比较switch -file ...Get-Content ... | ForEach-Object ... Get-Content ... | ForEach-Object ...

Running the following test script:运行以下测试脚本:

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, 
  (Measure-Command { get-content $tmpFile | % { $_ }  }).TotalSeconds
  Remove-Item $tmpFile
}

yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):例如,在我的机器上产生以下时间(绝对数字并不重要,但它们的比率应该给你一个感觉):

0.0578924   # switch -file
6.0417638   # Get-Content | ForEach-Object

That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.也就是说,基于管道的解决方案比switch -file解决方案慢大约 100 (!) 倍。


Digging deeper:深入挖掘:

Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative: Frode F.指出Get-Content在处理大文件时速度很慢 - 尽管它的便利性使其成为一种流行的选择 - 并提到直接使用 .NET Framework 作为替代方案:

  • Using [System.IO.File]::ReadAllLines() ;使用[System.IO.File]::ReadAllLines() ; however, given that it reads the entire file into memory, that is only an option with smallish files.然而,鉴于它将整个文件读入内存,这只是小文件的一种选择。

  • Using [System.IO.StreamReader] 's ReadLine() method in a loop.在循环中使用[System.IO.StreamReader]ReadLine()方法。

However, use of the pipeline in itself , irrespective of the specific cmdlets used, introduces overhead.但是,无论使用何种特定的 cmdlet,使用管道本身都会带来开销。 When performance matters - but only then - you should avoid it.当性能很重要时 - 但只有这样 - 你应该避免它。

Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of collection operator .ForEach() requires PSv4+):这是一个更新的测试,其中包括使用 .NET Framework 方法的命令,无论是否使用管道(使用集合运算符 .ForEach .ForEach()需要 PSv4+):

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
  (Measure-Command { 
    $sr = [IO.StreamReader] (Convert-Path $tmpFile)
    while(-not $sr.EndOfStream) { $sr.ReadLine() }
    $sr.Close() 
  }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
  (Measure-Command { Get-Content $tmpFile | % { $_ }  }).TotalSeconds
  
  Remove-Item $tmpFile
}

Sample results, from fastest to slowest:示例结果,从最快到最慢:

0.0571143  # switch -file
0.2035162  # [System.IO.StreamReader] in a loop
0.6756535  # [System.IO.File]::ReadAllText() with .ForEach() collection operator
1.5088355  # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
5.9815751  # (pipeline) Get-Content with ForEach-Object

switch -file is the fastest by a factor of around 3, followed by the .NET + loop solution; switch -file是最快的,大约是 3 倍,其次是 .NET + 循环解决方案; using .ForEach() adds another factor of 3. Simply introducing the pipeline ( ForEach-Object instead of .ForEach() ) adds another factor of 2;使用.ForEach()增加了另一个因子 3。简单地引入管道( ForEach-Object而不是.ForEach() )增加了另一个因子 2; finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 4.最后,使用带有Get-ContentForEach-Object的管道会增加另一个因子 4。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM