简体   繁体   English

使用powershell在.txt文件中查找出现次数最多的字符串

[英]find string with most occurrences in .txt file with powershell

I'm currently working on a school assignment in powershell and I have to display the word longer then 6 characters with the most occurences from a txt file.我目前正在 powershell 中完成一项学校作业,我必须显示 txt 文件中出现次数最多的 6 个字符的单词。 I tried this code but it's returning the number of occurrences for each word and it's not what i need to do.我试过这段代码,但它返回每个单词的出现次数,这不是我需要做的。 Please help.请帮忙。

$a= Get-Content -Path .\germinal_split.txt
foreach($object in $a) 
{
if($object.length -gt 6){
$object| group-object | sort-object -Property "Count" -Descending | ft -Property ("Name", "Count");
}
 }

From the question we don't know what's in the text file.从问题中我们不知道文本文件中的内容。 The approaches so far will only work if there's only 1 word per line.到目前为止,这些方法只有在每行只有 1 个单词时才有效。 I think something like below will work regardless:我认为无论如何都可以使用以下内容:

$Content = (Get-Content 'C:\temp\test12-01-19' -raw) -Split "\b"

$content | 
Where-Object{$_.Length -ge 6} |
Group-Object -Property Length -NoElement | Sort-Object count | Format-Table -AutoSize

Here I'm reading in the file as a single string using the -Raw parameter.在这里,我使用 -Raw 参数将文件作为单个字符串读取。 Then I'm splitting on word boundaries.然后我在单词边界上分裂。 Still use Where to filter out words shorter than 6 characters.仍然使用 Where 过滤掉短于 6 个字符的单词。 Now use Group-Object against the length property as seen in the other examples.现在对长度属性使用 Group-Object,如其他示例中所示。

I don't use the word boundary RegEx very often.我不经常使用“边界正则表达式”这个词。 My concern is it might be weird around punctuation, but my tests look pretty good.我担心标点符号可能很奇怪,但我的测试看起来不错。

Let me know what you think.让我知道你的想法。

You can do something like the following:您可以执行以下操作:

$a = Get-Content -Path .\germinal_split.txt
$a | Where Length -gt 6 | Group-Object -NoElement | Sort-Object Count -Descending

Explanation:解释:

Where specifies the Length property's condition. Where指定Length属性的条件。 Group-Object -NoElement leaves off the Group property, which contains the actual object data. Group-Object -NoElement不包含包含实际对象数据的Group属性。 Sort-Object sorts the grouped output in ascending order by default.默认情况下, Sort-Object按升序对分组的输出进行排序。 Here the Count property is specified as the sorted property and the -Descending parameter reverses the default sort order.此处将Count属性指定为 sorted 属性,并且-Descending参数反转默认排序顺序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM