[英]powershell get the sum of a specific substring position
How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:如果有以下条件,如何使用 powershell 从子字符串中获取文件的总和并将总和放在特定位置(不同的行)上:
Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S获取以字符 D 开头的行的位置 3 到 13 的数字总和。 将总和放在以 S 开头的行的位置 10 到 14
So for example, if i have this file:例如,如果我有这个文件:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S xxxxx
I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.我想得到 38.95、18.95 和 18.95 的总和,然后将总和放在以 S 开头的行下方的 xxxxx 位置。
You could try:你可以试试:
-match
to find the lines using regex-pattern -match
使用 regex-pattern 查找行Substring()
to extract the values from the "D"-lines Substring()
从“D”行中提取值Measure-Object -Sum
to calculate the sum Measure-Object -Sum
计算总和-replace
to insert the value (searches using regex-pattern). -replace
插入值(使用正则表达式模式搜索)。 Ex:例如:
$text = Get-Content -Path file.txt
$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum
$text | ForEach-Object {
if($_ -match '^S') {
#Line starts with S -> Insert sum
$_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
} else {
#Not "S"-line -> output original content
$_
}
} | Set-Content -Path file.txt
PowerShell's switch
statement has powerful, but little-known features that allow you to iterate over the lines of a file ( -file
) and match lines by regular expressions ( -regex
) . PowerShell 的
switch
语句具有强大但鲜为人知的功能,允许您遍历文件 ( -file
) 的行并通过正则表达式 ( -regex
) 匹配行。
Not only is switch -file
convenient , it is also much faster than using cmdlets in a pipeline (see bottom section). switch -file
不仅方便,而且比在管道中使用 cmdlet快得多(参见底部)。
[double] $sum = 0
switch -regex -file file.txt {
# Note: The string to the left of each script block below ({ ... }),
# e.g., '^D', is the regex to match each line against.
# Inside the script blocks, $_ refers to the input line at hand.
# Extract number, add to sum, output the line.
'^D' { $sum += $_.Substring(2, 11); $_; continue }
# Summary line: place sum at character position 10, with 0-padding
# Note: `-replace ',', '.'` is only needed if your culture uses "," as the
# decimal mark.
'^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
# All other lines: pass them through.
default { $_ }
}
Note:注意:
continue
in the script blocks short-circuits further matching for the line at hand;continue
短路进一步匹配手头的线路; by contrast, if you used break
, no further lines would be processed .break
,则不会处理更多行。0
-left-padded number on the S
line at character position 10
.S
行的字符位置10
处有一个 18 个字符的0
左填充数字。 With your sample file, the above yields:使用您的示例文件,上面产生:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S 000000000000076.85
switch -file ...
to Get-Content ... | ForEach-Object ...
switch -file ...
和Get-Content ... | ForEach-Object ...
Get-Content ... | ForEach-Object ...
Running the following test script:运行以下测试脚本:
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds,
(Measure-Command { get-content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):例如,在我的机器上产生以下时间(绝对数字并不重要,但它们的比率应该给你一个感觉):
0.0578924 # switch -file
6.0417638 # Get-Content | ForEach-Object
That is, the pipeline-based solution is about 100 (!) times slower than the switch -file
solution.也就是说,基于管道的解决方案比
switch -file
解决方案慢大约 100 (!) 倍。
Digging deeper:深入挖掘:
Frode F. points out that Get-Content
is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative: Frode F.指出
Get-Content
在处理大文件时速度很慢 - 尽管它的便利性使其成为一种流行的选择 - 并提到直接使用 .NET Framework 作为替代方案:
Using [System.IO.File]::ReadAllLines()
;使用
[System.IO.File]::ReadAllLines()
; however, given that it reads the entire file into memory, that is only an option with smallish files.然而,鉴于它将整个文件读入内存,这只是小文件的一种选择。
Using [System.IO.StreamReader]
's ReadLine()
method in a loop.在循环中使用
[System.IO.StreamReader]
的ReadLine()
方法。
However, use of the pipeline in itself , irrespective of the specific cmdlets used, introduces overhead.但是,无论使用何种特定的 cmdlet,使用管道本身都会带来开销。 When performance matters - but only then - you should avoid it.
当性能很重要时 - 但只有这样 - 你应该避免它。
Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of collection operator .ForEach()
requires PSv4+):这是一个更新的测试,其中包括使用 .NET Framework 方法的命令,无论是否使用管道(使用集合运算符 .ForEach
.ForEach()
需要 PSv4+):
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
(Measure-Command {
$sr = [IO.StreamReader] (Convert-Path $tmpFile)
while(-not $sr.EndOfStream) { $sr.ReadLine() }
$sr.Close()
}).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
(Measure-Command { Get-Content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
Sample results, from fastest to slowest:示例结果,从最快到最慢:
0.0571143 # switch -file
0.2035162 # [System.IO.StreamReader] in a loop
0.6756535 # [System.IO.File]::ReadAllText() with .ForEach() collection operator
1.5088355 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
5.9815751 # (pipeline) Get-Content with ForEach-Object
switch -file
is the fastest by a factor of around 3, followed by the .NET + loop solution; switch -file
是最快的,大约是 3 倍,其次是 .NET + 循环解决方案; using .ForEach()
adds another factor of 3. Simply introducing the pipeline ( ForEach-Object
instead of .ForEach()
) adds another factor of 2;使用
.ForEach()
增加了另一个因子 3。简单地引入管道( ForEach-Object
而不是.ForEach()
)增加了另一个因子 2; finally, using the pipeline with Get-Content
and ForEach-Object
adds another factor of 4.最后,使用带有
Get-Content
和ForEach-Object
的管道会增加另一个因子 4。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.