There is an unsorted text file of about 100 million short lines:
Lucy
Mary
Mary
Mary
John
John
John
Lucy
Mark
Mary
I need to get
Mary
Mary
Mary
John
John
Lucy
I cannot get the lines ordered according to how many times each line is repeated in the text, ie the most frequently occurring lines must be listed first.
You could also use Group-Object
to group equal lines together like below:
Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
else { $_.Group }
} | Sort-Object -Descending
Result:
Mary
Mary
Mary
Mark
Lucy
John
John
iRon may have a point that 'Mark' should not be in the output and I may have misinterpreted the question ( remove one instance of each identical line ) in the above answer.
If that is correct, then the code can be even easier:
(Get-Content -Path 'D:\Test\unsorted.txt').Trim() | Group-Object | ForEach-Object {
$_.Group | Select-Object -Skip 1
} | Sort-Object -Descending
which will output
Mary
Mary
Mary
Lucy
John
John
$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy
Explanation
$Count = @{}
foreach ($Item in $List) { $Count[$Item]++ }
$Null
+ 1
=> 1
)$Count.GetEnumerator() |Sort-Object -Descending 'Value'
ForEach-Object {,$_.Name * ($_.Value - 1) }
,$_.Name
forces the string to an array ... * ($_.Value - 1)
repeat the array 1 less times
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.