如何删除 Windows PowerShell 中文本文件中每个（相同）行的 1 个实例并将剩余的相同行分组？

Question

There is an unsorted text file of about 100 million short lines:有一个约 1 亿短行的未排序文本文件：

Lucy 
Mary 
Mary 
Mary 
John 
John 
John 
Lucy 
Mark
Mary

I need to get我需要得到

Mary 
Mary 
Mary 
John 
John 
Lucy

I cannot get the lines ordered according to how many times each line is repeated in the text, ie the most frequently occurring lines must be listed first.我无法根据每行在文本中重复的次数对行进行排序，即最常出现的行必须首先列出。

Answer 1

You could also use Group-Object to group equal lines together like below:您还可以使用Group-Object将相等的行组合在一起，如下所示：

Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
    if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
    else { $_.Group }
} | Sort-Object -Descending

Result:结果：

Mary 
Mary 
Mary
Mark
Lucy 
John 
John

iRon may have a point that 'Mark' should not be in the output and I may have misinterpreted the question ( remove one instance of each identical line ) in the above answer. iRon可能认为“Mark”不应该出现在 output 中，我可能误解了上述答案中的问题（删除每个相同行的一个实例）。

If that is correct, then the code can be even easier:如果这是正确的，那么代码可以更简单：

(Get-Content -Path 'D:\Test\unsorted.txt').Trim() | Group-Object | ForEach-Object {
    $_.Group | Select-Object -Skip 1 
} | Sort-Object -Descending

which will output这将 output

Mary
Mary
Mary
Lucy
John
John

Answer 2

$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
    ForEach-Object { ,$_.Name * ($_.Value - 1) }

Mary
Mary
Mary
John
John
Lucy

Explanation解释

$Count = @{}
Create a new hashtable创建一个新的哈希表
foreach ($Item in $List) { $Count[$Item]++ }
Count the repeating instances计算重复实例
- starting from nothing ( $Null + 1 => 1 )从零开始（ $Null + 1 => 1 ）
$Count.GetEnumerator() |Sort-Object -Descending 'Value'
Sorts (descending) the hashtable based on the values根据值对哈希表进行排序（降序）
ForEach-Object {,$_.Name * ($_.Value - 1) }
Iterate to the found instances迭代到找到的实例
- ,$_.Name forces the string to an array ,$_.Name将字符串强制为数组
- ... * ($_.Value - 1) repeat the array 1 less times ... * ($_.Value - 1)少重复数组 1 次

如何删除 Windows PowerShell 中文本文件中每个（相同）行的 1 个实例并将剩余的相同行分组？

问题描述

2 个解决方案

解决方案1
1 2023-01-04 12:15:56

解决方案2
0 2023-01-04 09:15:58

如何删除 Windows PowerShell 中文本文件中每个（相同）行的 1 个实例并将剩余的相同行分组？

问题描述

2 个解决方案

解决方案1 1 2023-01-04 12:15:56

解决方案2 0 2023-01-04 09:15:58

解决方案1
1 2023-01-04 12:15:56

解决方案2
0 2023-01-04 09:15:58