简体   繁体   中英

How to remove 1 instance of each (identical) line in a text file in Windows PowerShell and group the remaining identical lines?

There is an unsorted text file of about 100 million short lines:

Lucy 
Mary 
Mary 
Mary 
John 
John 
John 
Lucy 
Mark
Mary

I need to get

Mary 
Mary 
Mary 
John 
John 
Lucy

I cannot get the lines ordered according to how many times each line is repeated in the text, ie the most frequently occurring lines must be listed first.

You could also use Group-Object to group equal lines together like below:

Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
    if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
    else { $_.Group }
} | Sort-Object -Descending

Result:

Mary 
Mary 
Mary
Mark
Lucy 
John 
John 

iRon may have a point that 'Mark' should not be in the output and I may have misinterpreted the question ( remove one instance of each identical line ) in the above answer.

If that is correct, then the code can be even easier:

(Get-Content -Path 'D:\Test\unsorted.txt').Trim() | Group-Object | ForEach-Object {
    $_.Group | Select-Object -Skip 1 
} | Sort-Object -Descending

which will output

Mary
Mary
Mary
Lucy
John
John
$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
    ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy

Explanation

  • $Count = @{}
    Create a new hashtable
  • foreach ($Item in $List) { $Count[$Item]++ }
    Count the repeating instances
    • starting from nothing ( $Null + 1 => 1 )
  • $Count.GetEnumerator() |Sort-Object -Descending 'Value'
    Sorts (descending) the hashtable based on the values
  • ForEach-Object {,$_.Name * ($_.Value - 1) }
    Iterate to the found instances
    • ,$_.Name forces the string to an array
    • ... * ($_.Value - 1) repeat the array 1 less times

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM