简体   繁体   English

使用LINQ查找最频繁的单词

[英]Find Most Frequent Words using LINQ

I have been trying to find most frequent words from a list of strings. 我一直试图从字符串列表中找到最常用的单词。 I have tried something like Find the most occurring number in a List<int> 我尝试过在List <int>中找到最常出现的数字

but issue is that it returns only one word, but all those words are required which are most frequent . 但问题是它只返回一个单词,但所有这些单词都是最常见的

For example, if we call that LINQ query on following list: 例如,如果我们在以下列表中调用LINQ查询:

Dubai
Karachi
Lahore
Madrid
Dubai
Sydney
Sharjah
Lahore
Cairo

it should result us in: 它应该导致我们:

ans: Dubai, Lahore ans:迪拜,拉合尔

Use a group by and then order by count: 使用group by然后按计数排序:

var result = list
  .GroupBy(s => s)
  .Where(g=>g.Count()>1)
  .OrderByDescending(g => g.Count())
  .Select(g => g.Key);

If you need all words which are occurring repeatedly .. 如果您需要重复发生的所有单词..

  List<string> list = new List<string>();
            list.Add("A");
            list.Add("A");
            list.Add("B");
            var most = (from i in list
                        group i by i into grp
                        orderby grp.Count() descending
                        select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1);

If you want to get several most frequent words, you can use this method: 如果您想获得几个最常用的单词,可以使用以下方法:

public List<string> GetMostFrequentWords(List<string> list)
{
    var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count);
    if (!groups.Any()) return new List<string>();

    var maxCount = groups.First().Count;

    return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList();
}

[TestMethod]
public void Test()
{
    var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList();
    var result = GetMostFrequentWords(list);

    Assert.AreEqual(2, result.Count);
    Assert.AreEqual("Dubai", result[0]);
    Assert.AreEqual("Lahore", result[1]);
}

In case you want Dubai, Lahore only (ie only words with top occurrence, which is 2 in the sample): 如果你想Dubai, Lahore (即仅与顶部发生,其值为2的样品中的话):

  List<String> list = new List<String>() {
   "Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo"
   };

  int count = -1;

  var result = list
    .GroupBy(s => s, s => 1)
    .Select(chunk => new {
      name = chunk.Key,
      count = chunk.Count()
     })
    .OrderByDescending(item => item.count)
    .ThenBy(item => item.name)
    .Where(item => {
      if (count < 0) {
        count = item.count; // side effects, alas (we don't know count a-priory)

        return true;
      }
      else
        return item.count == count;
    })
    .Select(item => item.name);

Test: 测试:

  // ans: Dubai, Lahore
  Console.Write("ans: " + String.Join(", ", result));

I'm sure there must be better way, but one thing I manage to make (which may help you to make it more optimised) is something like follow 我敢肯定必须有更好的方法,但我设法做的一件事(可能会帮助你做出更优化)就像是跟随

List<string> list = new List<string>();
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Dubai");
        list.Add("Lahor");
        list.Add("Dubai");
        list.Add("Sarjah");
        list.Add("Sarjah");


        int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Select(grp => grp.Count()).First();
        IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count())
            .Where(grp => grp.Count() >= most)
            .Select(grp => grp.Key) ;

this will list of those who are occurring most frequent, if two entries are occurring frequency is same, they both will be included. 这将列出那些最常发生的事件,如果两个条目的发生频率相同,它们都将被包括在内。

NOTE we are not selecting entries having frequency more than once. 注意我们没有选择频率超过一次的条目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM