简体   繁体   中英

Sort List by occurrence of a word by LINQ C#

i have stored data in list like

 List<SearchResult> list = new List<SearchResult>();
 SearchResult sr = new SearchResult();
 sr.Description = "sample description";
 list.Add(sr);

suppose my data is stored in description field like

"JCB Excavator - ECU P/N: 728/35700"
"Geo Prism 1995 - ABS #16213899"
"Geo Prism 1995 - ABS #16213899"
"Geo Prism 1995 - ABS #16213899"
"Wie man BBA reman erreicht"
"this test JCB"
"Ersatz Airbags, Gurtstrammer und Auto Körper Teile"

now i want to query the list with my search term like geo jcb

if you look then the word geo has stored many times in the description field. so i want to sort my list in such way that the word in search term found maximum that data will come first. please help me to do so. thanks

You can use string.Split and Enumerable.OrderByDescending with an anonymous type:

List<SearchResult> list = new List<SearchResult>() { 
    new SearchResult(){Description="JCB Excavator - ECU P/N: 728/35700"},
    new SearchResult(){Description="Geo Prism 1995 - ABS #16213899"},
    new SearchResult(){Description="Geo Prism 1995 - ABS #16213899"},
    new SearchResult(){Description="Geo Prism 1995 - ABS #16213899"},
    new SearchResult(){Description="Wie man BBA reman erreicht"},
    new SearchResult(){Description="this test JCB"},
    new SearchResult(){Description="Ersatz Airbags, Gurtstrammer und Auto Körper Teile"},
};

string[] searchTerms = new[]{"geo", "jcb"};
var results = 
    list.Select(sr => new { Searchresult = sr, Words = sr.Description.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) })
        .OrderByDescending(x => x.Words.Count(w => searchTerms.Contains(w.ToLower())))
        .Select(x => x.Searchresult);

You could use a simple regular expression, just combine your search terms in the pattern with | :

var re = new Regex("geo|JCB",RegexOptions.IgnoreCase);

Then count the number of matches in your description:

Console.WriteLine(re.Matches(description).Count); // Outputs '5' in your example

You could order your list by this:

searchResults.OrderByDescending(r => re.Matches(r).Count);

Live example: http://rextester.com/MMAT58077


Edit : According to your new question linked in the comments (and hopefully you'll update the details of this question and let the duplicate die) you wish to order the results so that the most common result shows up earlier on in the list of results.

To do this, you could first calculate the relevant weighting of each search phrase, and use this to order the results.

Step1: Calculate the weighting by counting the total number of times each search word appears in the entire set of data:

var wordsToFind = "Geo JCB".Split();
// find number of times each search phrase is found
var weights = wordsToFind.Select( w => new { 
         Word = w, 
         Weight = list.Where(x => x.Description.Contains(w)).Count() 
    } );

For the data in this question at the moment this givves the result:

GEO: 3
JCB: 2

So you want all the GEO results first, followed by JCB . I guess a nice-to-have would be to have the first result be the one where GEO is mentioned most often.

Step2: Use the weightings calculated in step 1 to order the results of a search.

var values = list.Select(x => new { 
      SearchResult = x, 
      Words = x.Description.Split(' ')
   })
   .Select(x => new { 
       SearchResult = x.SearchResult, 
       Weight = weights.Sum(w => x.Words.Contains(w.Word) ? w.Weight : 0)
   })
   .OrderByDescending(x => x.Weight)
   .Select(x => x.SearchResult);

Live example: http://rextester.com/SLH38676

List<SearchResult> list = new List<SearchResult>() 
{ 
   new SearchResult { Description = "JCB Excavator - ECU P/N: 728/35700" },
   new SearchResult { Description = "Geo Prism 1995 - ABS #16213899" },
   new SearchResult { Description = "Geo Prism 1995 - ABS #16213899" },
   new SearchResult { Description = "Geo Prism 1995 - ABS #16213899" },
   new SearchResult { Description = "Wie man BBA reman erreicht" },
   new SearchResult { Description = "this test JCB" },
   new SearchResult { Description = "Ersatz Airbags, Gurtstrammer und Auto Körper Teile" }            
   };

   var wordsToFind = "Geo JCB".Split();
   var values = list.Select(x => new { SearchResult = x, Count = x.Description.Split(' ')
                                             .Where(c => wordsToFind .Contains(c)).Count() })
                    .OrderByDescending(x => x.Count)
                    .Select(x => x.SearchResult);
var results = db.Blogs.AsEnumerable()
    .Select(sr => new
    {
        Searchresult = sr,
        Words = Regex.Split(sr.Name, @"[^\S\r\n {1,}").Union(Regex.Split(sr.Name2, @"[^\S\r\n]{1,}"))
    })
    .OrderByDescending(x => x.Words.Count(w => {
        foreach (var item in searchTerms)
        {
            if(w.ToLower().Contains(item))
            {
                return true;
            }
        }
        return false;
    }))
    .Select(x => x.Searchresult);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM