[英]Remove Top Line of Text File with PowerShell
I am trying to just remove the first line of about 5000 text files before importing them.我试图在导入前删除大约 5000 个文本文件的第一行。
I am still very new to PowerShell so not sure what to search for or how to approach this.我对 PowerShell 还是很陌生,所以不确定要搜索什么或如何解决这个问题。 My current concept using pseudo-code:
我目前使用伪代码的概念:
set-content file (get-content unless line contains amount)
However, I can't seem to figure out how to do something like contains.但是,我似乎无法弄清楚如何做像包含这样的事情。
It is not the most efficient in the world, but this should work: 它不是世界上最高效的,但这应该可以工作:
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
While I really admire the answer from @hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!). 尽管我非常欣赏@hoge的答案,因为它提供了非常简洁的技术和用于将其概括化的包装函数,并且我对此表示鼓励,但我不得不评论使用临时文件的其他两个答案(它像指甲一样at我在黑板上!)。
Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses: 假设文件不是很大,您可以通过明智地使用括号来强制管道在不连续的部分中进行操作(从而避免使用临时文件):
(Get-Content $file | Select-Object -Skip 1) | Set-Content $file
... or in short form: ...或简写为:
(gc $file | select -Skip 1) | sc $file
Using variable notation, you can do it without a temporary file: 使用变量符号,您可以在没有临时文件的情况下进行操作:
${C:\file.txt} = ${C:\file.txt} | select -skip 1
function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
if ( -not (Test-Path $path -PathType Leaf) ) {
throw "invalid filename"
}
ls $path |
% { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}
I just had to do the same task, and gc | select ... | sc
我只是做同样的任务,
gc | select ... | sc
gc | select ... | sc
gc | select ... | sc
took over 4 GB of RAM on my machine while reading a 1.6 GB file. gc | select ... | sc
在读取1.6 GB文件时占用了我计算机上的4 GB RAM。 It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer ), at which point I had to kill it. 读取整个文件后至少有20分钟没有完成(如Process Explorer中的Read Bytes所报告),这时我不得不将其杀死。
My solution was to use a more .NET approach: StreamReader
+ StreamWriter
. 我的解决方案是使用更多.NET方法:
StreamReader
+ StreamWriter
。 See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type? 请参阅此答案,以获得有关讨论性能的绝佳答案: 在Powershell中,按记录类型拆分大文本文件的最有效方法是什么?
Below is my solution. 下面是我的解决方案。 Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):
是的,它使用一个临时文件,但就我而言,没关系(这是一个庞大的SQL表创建和插入语句文件):
PS> (measure-command{
$i = 0
$ins = New-Object System.IO.StreamReader "in/file/pa.th"
$outs = New-Object System.IO.StreamWriter "out/file/pa.th"
while( !$ins.EndOfStream ) {
$line = $ins.ReadLine();
if( $i -ne 0 ) {
$outs.WriteLine($line);
}
$i = $i+1;
}
$outs.Close();
$ins.Close();
}).TotalSeconds
It returned: 它返回:
188.1224443
Inspired by AASoft's answer , I went out to improve it a bit more: 受到ASoft的回答的启发,我进一步改进了它:
$i
and the comparison with 0
in every loop $i
和每个循环中与0
进行比较 try..finally
block to always close the files in use try..finally
块中,以始终关闭正在使用的文件 $p
to reference the current directory $p
引用当前目录 These changes lead to the following code: 这些更改导致以下代码:
$p = (Get-Location).Path
(Measure-Command {
# Number of lines to skip
$skip = 1
$ins = New-Object System.IO.StreamReader ($p + "\test.log")
$outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
try {
# Skip the first N lines, but allow for fewer than N, as well
for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
$ins.ReadLine()
}
while( !$ins.EndOfStream ) {
$outs.WriteLine( $ins.ReadLine() )
}
}
finally {
$outs.Close()
$ins.Close()
}
}).TotalSeconds
The first change brought the processing time for my 60 MB file down from 5.3s
to 4s
. 第一次更改使我60 MB文件的处理时间从
5.3s
到4s
。 The rest of the changes is more cosmetic. 其余的更改更美观。
I just learned from a website: 我刚从一个网站中学到:
Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }
Or you can use the aliases to make it short, like: 或者,您可以使用别名使其简短,例如:
gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
$x = get-content $file
$x[1..$x.count] | set-content $file
Just that much. 就这么多。 Long boring explanation follows.
冗长无聊的解释如下。 Get-content returns an array.
Get-content返回一个数组。 We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.
如本文和其他 Scripting Guy文章中所示 ,我们可以“索引”数组变量。
For example, if we define an array variable like this, 例如,如果我们定义这样的数组变量,
$array = @("first item","second item","third item")
so $array returns 所以$ array返回
first item
second item
third item
then we can "index into" that array to retrieve only its 1st element 然后我们可以“索引”该数组以仅检索其第一个元素
$array[0]
or only its 2nd 或仅其第二
$array[1]
or a range of index values from the 2nd through the last. 或从第二个到最后一个索引值的范围 。
$array[1..$array.count]
skip` didn't work, so my workaround is 跳过不起作用,所以我的解决方法是
$LinesCount = $(get-content $file).Count
get-content $file |
select -Last $($LinesCount-1) |
set-content "$file-temp"
move "$file-temp" $file -Force
Following on from Michael Soren's answer.继迈克尔索伦的回答之后。
If you want to edit all.txt files in the current directory and remove the first line from each.如果要编辑当前目录中的 all.txt 文件并删除每个文件的第一行。
Get-ChildItem (Get-Location).Path -Filter *.txt |
Foreach-Object {
(Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}
For smaller files you could use this: 对于较小的文件,您可以使用以下命令:
& C:\\windows\\system32\\more +1 oldfile.csv > newfile.csv | &C:\\ windows \\ system32 \\ more +1 oldfile.csv> newfile.csv | out-null
零
... but it's not very effective at processing my example file of 16MB. ...但是在处理我的16MB示例文件时效果不是很好。 It doesn't seem to terminate and release the lock on newfile.csv.
它似乎没有终止并释放对newfile.csv的锁定。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.