简体   繁体   中英

Simultaneous sort (C#)?

I have a word frequency list which contains strings ordered alphabetically and ints unsorted that represent the frequency of the words(there is no need to read a txt or something cause a "(letter) (number)" query is typed by the user in the console). I don´t need to count them or something like that but to print the most frequent words by every specific input of ie a query in the console like:"AA 12". In this case it started with "A" so the ideal thing will be to retrieve the most frequent startWith("A") with at least 5 words in descending order related to its frequency but at the same time with its AZ order.

I have read many stuff on BSTs, Dictionary, Tuple, SortedList, List, SortedSet, Linq... and algorithms books, and I learned that the keys and values can be sorted by Ascending, Descending, AZ, but not in a simultaneously way... Someone can explain me how can I introduce this query of "AA 12" in which I already split to string a = "AA"; and int b=12; into a BST or Binary Search Tree of string,int word frequency-style but without the need to count just to apply a query that retrieve the 5 most frequent words that match the string and the int of this 100000 word-frequency list and console print it like the Google Search autocomplete but more basic?

sample word-frequency AZ list:

AA 12
AAA 32
AAB 4
AABB 38
BBAA 3
CDDDA 76
...
YZZZ 45
ZZZZZY 356

user-query: "AA 15"

ideal answer:

AAA
AA
AABB
AAB

The code:

 var list = new List<KeyValuePair<string, int>>();
 StreamReader sr = new StreamReader("C:\\dicti.txt");

 while (true)
 {
      string line = sr.ReadLine();   //read each line
      string[] ln;
      if (line == null) break;            // no more lines
      try
      {
           ln = line.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
           string a = ln[0];
           int b = Convert.ToInt32(ln[1]);

           list.Add(new KeyValuePair<string, int>(a, b));       
      }
      catch (IndexOutOfRangeException)
      {
           break;
      }

      string word = Console.ReadLine();

      string[] ln2;
      ln2 = word.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
      string am = ln2[0];
      int bm = Convert.ToInt32(ln2[1]);

This is the code I´ve written so far. I'm kind of lost on how to get the values sorted by alphabetical order and by frecuency respective with the first letter of the user query.


This is my actual version of the code... I´m having 1:15 minutes reading complete 1000 words´s frequency list so... I want to now how can I improve my lambdas to get the 15 seconds 1000 word frequency list requierement or what can I do then if lambdas won´t work??

    static void Main(string[] args)
    {
        var dic = new Dictionary<string, int>();


        int contador = 0;

        StreamReader sr = new StreamReader("C:\\dicti.txt");

        while (true)
        {

            string line = sr.ReadLine();   // To read lines
            string[] ln;
            if (line == null) break;            // There is no more lines
            try
            {
                ln = line.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);
                string a = ln[0];
                int b = Convert.ToInt32(ln[1]);

                dic.Add(a,b);   

            }
            catch (IndexOutOfRangeException) { break; }

        }

        string[] ln2;
        string am,word;
        int bm;
        do
        {
            //counter++;
            do
            {
                word = Console.ReadLine();



                ln2 = word.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries);

                    am = ln2[0];

                    bm = Convert.ToInt32(ln2[1]);

            } while (!(am.Length >= 2 && bm >= 1 && bm <= 1000000 )); 

            if (true)
            {
                var aj = (dic.Where(x => x.Value >= bm).Where(x => x.Key.StartsWith(am)).OrderByDescending(d => d.Value).Take(2));


                foreach (var p in aj)
                {


                        Console.WriteLine("{0} ", p.Key);



                }

            }
        } while (counter < 1001);



    }

}

}

Do you want something like this?

    public static IEnumerable<KeyValuePair<string, int>> SearchAndSortBy(Dictionary<string, int> fullSet, string searchFilter)
    {
        return fullSet.Where((pair) => pair.Key.Contains(searchFilter)).OrderByDescending((pair) => pair.Value);
    }

Then you use it like this:

        var mySet = new Dictionary<string, int>();
        mySet.Add("AA", 12);
        mySet.Add("AAA", 32);
        mySet.Add("AAB", 4);
        mySet.Add("AABB", 38);
        mySet.Add("BBAA", 3);
        mySet.Add("CDDDA", 76);
        //...
        mySet.Add("YZZZ", 45);
        mySet.Add("ZZZZZY", 356);

        var results = SearchAndSortBy(mySet, "AA");
        foreach (var item in results)
        {
            Console.Write(item.Key);
            Console.Write(" ");
            Console.WriteLine(item.Value);
        }

And when I run it, I get these results:

AABB 38
AAA 32
AA 12
AAB 4
BBAA 3

I could even change the for loop to:

    foreach (var item in results.Take(5))

If I only wanted the top 5.

I think you can tweak the OrderBy to achieve your search requirements. Let's take a quick look:

Your input:

AA 12
AAA 32
AAB 4
AABB 38
BBAA 3
CDDDA 76

Desired result for searching "AA"

AAA
AA
AABB
AAB

So AAA comes before AA because it has a higher frequency but AABB comes after because AABB < AAA . Now here comes the problem: It is also AA < AAA so if you sort your keys alphabetically then AA will always appear before AAA regardless of it's frequency.

But if you "continue" each word with its last character then you get what you want by first sorting alphabetically and then by frequency:

public static IEnumerable<KeyValuePair<string, int>> FilterAndSort(IEnumerable<KeyValuePair<string, int>> fullSet, string searchFilter, int maxKeyLength)
{
    return fullSet
            .Where(p => p.Key.StartsWith(searchFilter))
            .OrderBy(p => p.Key.PadRight(maxKeyLength, p.Key.Last()))
            .ThenByDescending(p => p.Value);
}

Test:

List<KeyValuePair<string, int>> list = new List<KeyValuePair<string,int>>
{
    new KeyValuePair<string, int>("AA", 12),
    new KeyValuePair<string, int>("AAA", 32),
    new KeyValuePair<string, int>("AAB", 4),
    new KeyValuePair<string, int>("AABB", 38),
    new KeyValuePair<string, int>("BBAA", 3),
    new KeyValuePair<string, int>("CDDDA", 76),
};

foreach (var p in FilterAndSort(list, "AA", list.Max(p => p.Key.Length)))
{
    Console.WriteLine("{0} {1}", p.Key, p.Value);
} 

Output:

AAA 32
AA 12
AABB 38
AAB 4

You can optimize it by precomputing the padded words when you read the list. IN this case you might want to use a Tuple<string, string, int> (original word, padded word, frequency). instead of a KeyValuePair Will take up a bit more memory but you have to do it only once instead on every filter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM