[英]How to remove 1 instance of each (identical) line in a text file in Windows PowerShell and group the remaining identical lines?
There is an unsorted text file of about 100 million short lines:有一个约 1 亿短行的未排序文本文件:
Lucy
Mary
Mary
Mary
John
John
John
Lucy
Mark
Mary
I need to get我需要得到
Mary
Mary
Mary
John
John
Lucy
I cannot get the lines ordered according to how many times each line is repeated in the text, ie the most frequently occurring lines must be listed first.我无法根据每行在文本中重复的次数对行进行排序,即最常出现的行必须首先列出。
You could also use Group-Object
to group equal lines together like below:您还可以使用
Group-Object
将相等的行组合在一起,如下所示:
Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
else { $_.Group }
} | Sort-Object -Descending
Result:结果:
Mary
Mary
Mary
Mark
Lucy
John
John
iRon may have a point that 'Mark' should not be in the output and I may have misinterpreted the question ( remove one instance of each identical line ) in the above answer. iRon可能认为“Mark”不应该出现在 output 中,我可能误解了上述答案中的问题(删除每个相同行的一个实例)。
If that is correct, then the code can be even easier:如果这是正确的,那么代码可以更简单:
(Get-Content -Path 'D:\Test\unsorted.txt').Trim() | Group-Object | ForEach-Object {
$_.Group | Select-Object -Skip 1
} | Sort-Object -Descending
which will output这将 output
Mary
Mary
Mary
Lucy
John
John
$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy
Explanation解释
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Null
+ 1
=> 1
)$Null
+ 1
=> 1
)$Count.GetEnumerator() |Sort-Object -Descending 'Value'
ForEach-Object {,$_.Name * ($_.Value - 1) }
,$_.Name
forces the string to an array ,$_.Name
将字符串强制为数组... * ($_.Value - 1)
repeat the array 1 less times ... * ($_.Value - 1)
少重复数组 1 次
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.