简体   繁体   English

使用 PowerShell 删除文本文件的第一行

[英]Remove Top Line of Text File with PowerShell

I am trying to just remove the first line of about 5000 text files before importing them.我试图在导入前删除大约 5000 个文本文件的第一行。

I am still very new to PowerShell so not sure what to search for or how to approach this.我对 PowerShell 还是很陌生,所以不确定要搜索什么或如何解决这个问题。 My current concept using pseudo-code:我目前使用伪代码的概念:

set-content file (get-content unless line contains amount)

However, I can't seem to figure out how to do something like contains.但是,我似乎无法弄清楚如何做像包含这样的事情。

It is not the most efficient in the world, but this should work: 它不是世界上最高效的,但这应该可以工作:

get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force

While I really admire the answer from @hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!). 尽管我非常欣赏@hoge的答案,因为它提供了非常简洁的技术和用于将其概括化的包装函数,并且我对此表示鼓励,但我不得不评论使用临时文件的其他两个答案(它像指甲一样at我在黑板上!)。

Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses: 假设文件不是很大,您可以通过明智地使用括号来强制管道在不连续的部分中进行操作(从而避免使用临时文件):

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

... or in short form: ...或简写为:

(gc $file | select -Skip 1) | sc $file

Using variable notation, you can do it without a temporary file: 使用变量符号,您可以在没有临时文件的情况下进行操作:

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

I just had to do the same task, and gc | select ... | sc 我只是做同样的任务, gc | select ... | sc gc | select ... | sc gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. gc | select ... | sc在读取1.6 GB文件时占用了我计算机上的4 GB RAM。 It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer ), at which point I had to kill it. 读取整个文件后至少有20分钟没有完成(如Process Explorer中的Read Bytes所报告),这时我不得不将其杀死。

My solution was to use a more .NET approach: StreamReader + StreamWriter . 我的解决方案是使用更多.NET方法: StreamReader + StreamWriter See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type? 请参阅此答案,以获得有关讨论性能的绝佳答案: 在Powershell中,按记录类型拆分大文本文件的最有效方法是什么?

Below is my solution. 下面是我的解决方案。 Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file): 是的,它使用一个临时文件,但就我而言,没关系(这是一个庞大的SQL表创建和插入语句文件):

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

It returned: 它返回:

188.1224443

Inspired by AASoft's answer , I went out to improve it a bit more: 受到ASoft的回答的启发,我进一步改进了它:

  1. Avoid the loop variable $i and the comparison with 0 in every loop 避免循环变量$i和每个循环中与0进行比较
  2. Wrap the execution into a try..finally block to always close the files in use 将执行结果包装到try..finally块中,以始终关闭正在使用的文件
  3. Make the solution work for an arbitrary number of lines to remove from the beginning of the file 使解决方案适用于从文件开头删除任意行
  4. Use a variable $p to reference the current directory 使用变量$p引用当前目录

These changes lead to the following code: 这些更改导致以下代码:

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

The first change brought the processing time for my 60 MB file down from 5.3s to 4s . 第一次更改使我60 MB文件的处理时间从5.3s4s The rest of the changes is more cosmetic. 其余的更改更美观。

I just learned from a website: 我刚从一个网站中学到:

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

Or you can use the aliases to make it short, like: 或者,您可以使用别名使其简短,例如:

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
$x = get-content $file
$x[1..$x.count] | set-content $file

Just that much. 就这么多。 Long boring explanation follows. 冗长无聊的解释如下。 Get-content returns an array. Get-content返回一个数组。 We can "index into" array variables, as demonstrated in this and other Scripting Guys posts. 本文其他 Scripting Guy文章中所示 ,我们可以“索引”数组变量。

For example, if we define an array variable like this, 例如,如果我们定义这样的数组变量,

$array = @("first item","second item","third item")

so $array returns 所以$ array返回

first item
second item
third item

then we can "index into" that array to retrieve only its 1st element 然后我们可以“索引”该数组以仅检索其第一个元素

$array[0]

or only its 2nd 或仅其第二

$array[1]

or a range of index values from the 2nd through the last. 或从第二个到最后一个索引值的范围

$array[1..$array.count]

skip` didn't work, so my workaround is 跳过不起作用,所以我的解决方法是

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force

Following on from Michael Soren's answer.迈克尔索伦的回答之后。

If you want to edit all.txt files in the current directory and remove the first line from each.如果要编辑当前目录中的 all.txt 文件并删除每个文件的第一行。

Get-ChildItem (Get-Location).Path -Filter *.txt | 
Foreach-Object {
    (Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}

Another approach to remove the first line from file, using multiple assignment technique. 使用多重分配技术从文件中删除第一行的另一种方法。 Refer Link 参考链接

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename

For smaller files you could use this: 对于较小的文件,您可以使用以下命令:

& C:\\windows\\system32\\more +1 oldfile.csv > newfile.csv | &C:\\ windows \\ system32 \\ more +1 oldfile.csv> newfile.csv | out-null

... but it's not very effective at processing my example file of 16MB. ...但是在处理我的16MB示例文件时效果不是很好。 It doesn't seem to terminate and release the lock on newfile.csv. 它似乎没有终止并释放对newfile.csv的锁定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM