简体   繁体   中英

Sort by Count Occurrences of a Word in list rows linq

How do I sort a list by the occurrence of word in every row of linq data? I got an answer from here by someone that is giving the right output. Here is the code:

void Main()
{
    List<SearchResult> list = new List<SearchResult>() { 
        new SearchResult(){ID=1,Title="Geo Prism GEO 1995 GEO* - ABS #16213899"},
        new SearchResult(){ID=2,Title="Excavator JCB - ECU P/N: 728/35700"},
        new SearchResult(){ID=3,Title="Geo Prism GEO 1995 - ABS #16213899"},
        new SearchResult(){ID=4,Title="JCB Excavator JCB- ECU P/N: 728/35700"},
        new SearchResult(){ID=5,Title="Geo Prism GEO,GEO 1995 - ABS #16213899 GEO"},
        new SearchResult(){ID=6,Title="dog"},
    };

    var to_search = new[] { "Geo", "JCB" };

    var result = from searchResult in list
         let key_string = to_search.FirstOrDefault(ts =>  searchResult.Title.ToLower().Contains(ts.ToLower()))
         group searchResult by key_string into Group
         orderby Group.Count() descending
         select Group;
         result.ToList().Dump();



 }
// Define other methods and classes here
public class SearchResult
{
    public int ID { get; set; }
    public string Title { get; set; }
}

I am getting the output like

ID Title 
-- ------
1  Geo Prism GEO 1995 GEO* - ABS #16213899 
3  Geo Prism GEO 1995 - ABS #16213899 
5  Geo Prism GEO,GEO 1995 - ABS #16213899 GEO 
2  Excavator JCB - ECU P/N: 728/35700 
4  JCB Excavator JCB- ECU P/N: 728/35700 
6  dog 

the above output is ok. All rows having the ord GEO comes first because it found maximum time in most of the rows means GEO the word found in 3 rows and JCB found in two rows so JCB related rows comes next.

I need another sort after getting the above output on whole data. That is that GEO rows comes first which row has the GEO word maximum time. So my output would look like below:

ID Title 
-- ------
5  Geo Prism GEO,GEO 1995 - ABS #16213899 GEO 
1  Geo Prism GEO 1995 GEO* - ABS #16213899 
3  Geo Prism GEO 1995 - ABS #16213899 
4  JCB Excavator JCB- ECU P/N: 728/35700 
2  Excavator JCB - ECU P/N: 728/35700 
6  dog 

I found a linq query which counts occurrences of a word in string:

string text = @"Historically, the world of data and data the world of objects data" ;
string searchTerm = "data";
//Convert the string into an array of words
string[] source = text.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' },   StringSplitOptions.RemoveEmptyEntries);
var matchQuery = from word in source
             where word.ToLowerInvariant() == searchTerm.ToLowerInvariant()
             select word;
int wordCount = matchQuery.Count();

I got it from this url

How could I use the above code to sort my title? How could use second sort for counting the occurrences of a word in title field as a result my output would look like:

ID Title 
-- ------
5  Geo Prism GEO,GEO 1995 - ABS #16213899 GEO 
1  Geo Prism GEO 1995 GEO* - ABS #16213899 
3  Geo Prism GEO 1995 - ABS #16213899 
4  JCB Excavator JCB- ECU P/N: 728/35700 
2  Excavator JCB - ECU P/N: 728/35700 
6  dog 

Using WordCount as an extension methods for string, you can then use the simple Lambda expression:

list.OrderByDescending(sR => sR.Title.WordCount( to_search ))

If you wanted to omit all results that did not have any search terms, you can use the Where clause. Ie

IEnumerable<SearchResult> results = list
                .Where( sR => sR.Title.WordCount( searchTerms ) > 0 )
                .OrderByDescending( sR => sR.Title.WordCount( searchTerms ) );

EDIT If the search terms have a priority to them, you can do multiple sorts on each item like so (first sort by the least priority element, then the next, until the final sort is on the item with highest priority):

string[] searchTerms = new string[]{ "GEO","JCB" };
IEnumerable<SearchResult> results = list;
foreach( string s in searchTerms.Reverse() ) {
    results = results
        .OrderByDescending( sR => sR.Title.WordCount( s ) );
}

Extension Methods:

static class StringExtension{
        public static int WordCount( this String text, string searchTerm )
        {
            string[] source = text.Split( new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries );
            var matchQuery = from word in source
                             where word.ToLowerInvariant() == searchTerm.ToLowerInvariant()
                             select word;
            int wordCount = matchQuery.Count();
            return wordCount;
        }
        public static int WordCount( this String text, IEnumerable<string> searchTerms ) {
            int wordCount = 0;
            foreach( string searchTerm in searchTerms ) {
                wordCount += text.WordCount( searchTerm );
            }
            return wordCount;
        }
    }

How about this:

IEnumerable<SearchResult> result =
    from searchResult in list
    let key_string = to_search.FirstOrDefault(ts => searchResult.Title.ToLower().Contains(ts.ToLower()))
    group searchResult by key_string into Group
    orderby Group.Count() descending
    from item in Group.OrderByDescending(theItem => WordCount(theItem.Title, Group.Key))
    select item;

Using the following WordCount method:

public static int WordCount( String text, string searchTerm )
{
    string[] source = text.Split( new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries );
    var matchQuery = from word in source
                     where word.ToLowerInvariant() == searchTerm.ToLowerInvariant()
                     select word;
    int wordCount = matchQuery.Count();
    return wordCount;
}

One small issue I notice is the titles containing no matching words will be grouped together, so it is possible for them to be placed in front of titles with matching words.

After this line:

var result = from searchResult in list
         let key_string = to_search.FirstOrDefault(ts =>  searchResult.Title.ToLower().Contains(ts.ToLower()))
         group searchResult by key_string into Group
         orderby Group.Count() descending
         select Group;

You want something like this:

foreach (var group in result) {
      foreach (var item in group.OrderByDescending(theItem => WordCount(theItem.Title, group.Key))) {
          Console.WriteLine(item.Title);
      }
}

With an added method that looks like:

public static int WordCount(string haystack, string needle) {
    if (needle == null) {
        return 0;
    }
    string[] source = haystack.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries);
    var matchQuery = from word in source
                        where word.ToLowerInvariant() == needle.ToLowerInvariant()
                        select word;
    return matchQuery.Count();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM