简体   繁体   English

如何删除 Windows PowerShell 中文本文件中每个(相同)行的 1 个实例并将剩余的相同行分组?

[英]How to remove 1 instance of each (identical) line in a text file in Windows PowerShell and group the remaining identical lines?

There is an unsorted text file of about 100 million short lines:有一个约 1 亿短行的未排序文本文件:

Lucy 
Mary 
Mary 
Mary 
John 
John 
John 
Lucy 
Mark
Mary

I need to get我需要得到

Mary 
Mary 
Mary 
John 
John 
Lucy

I cannot get the lines ordered according to how many times each line is repeated in the text, ie the most frequently occurring lines must be listed first.我无法根据每行在文本中重复的次数对行进行排序,即最常出现的行必须首先列出。

You could also use Group-Object to group equal lines together like below:您还可以使用Group-Object将相等的行组合在一起,如下所示:

Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
    if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
    else { $_.Group }
} | Sort-Object -Descending

Result:结果:

Mary 
Mary 
Mary
Mark
Lucy 
John 
John 

iRon may have a point that 'Mark' should not be in the output and I may have misinterpreted the question ( remove one instance of each identical line ) in the above answer. iRon可能认为“Mark”不应该出现在 output 中,我可能误解了上述答案中的问题(删除每个相同行的一个实例)。

If that is correct, then the code can be even easier:如果这是正确的,那么代码可以更简单:

(Get-Content -Path 'D:\Test\unsorted.txt').Trim() | Group-Object | ForEach-Object {
    $_.Group | Select-Object -Skip 1 
} | Sort-Object -Descending

which will output这将 output

Mary
Mary
Mary
Lucy
John
John
$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
    ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy

Explanation解释

  • $Count = @{}
    Create a new hashtable创建一个新的哈希表
  • foreach ($Item in $List) { $Count[$Item]++ }
    Count the repeating instances计算重复实例
    • starting from nothing ( $Null + 1 => 1 )从零开始( $Null + 1 => 1
  • $Count.GetEnumerator() |Sort-Object -Descending 'Value'
    Sorts (descending) the hashtable based on the values根据值对哈希表进行排序(降序)
  • ForEach-Object {,$_.Name * ($_.Value - 1) }
    Iterate to the found instances迭代到找到的实例
    • ,$_.Name forces the string to an array ,$_.Name将字符串强制为数组
    • ... * ($_.Value - 1) repeat the array 1 less times ... * ($_.Value - 1)少重复数组 1 次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在PowerShell中的第一个特殊字符实例后删除数组中每行中的剩余文本 - How to remove remaining text in each line in an array after first instance of special character in PowerShell 如何使用 PowerShell 根据文件中的剩余文本替换行 - How to replace lines depending on the remaining text in file using PowerShell 使用powershell 2删除一行文本以及接下来的0至5行 - Remove a line of text and the next 0 to 5 lines with powershell 2 如何使用PowerShell从文本文件中删除空行 - How to remove blank lines from text file using powershell Powershell从文本文件中删除一系列行 - Powershell remove a range of lines from a text file 使用 PowerShell 从文本文件中删除空行 - remove empty lines from text file with PowerShell PowerShell-如何判断两个对象是否相同 - PowerShell - How to tell if two objects are identical 如何在Powershell脚本中匹配文本文件内容的每一行 - How to match each line of a text file contents in powershell script 如何通过powershell将每一行文本文件保存为数组 - How to save each line of text file as array through powershell 如果使用powershell,其中一组行包含特定的字符串,如何在文本文件中保留一组行? - How to keep a group of lines in a text file if one of the groups lines contains a specific string using powershell?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM