简体   繁体   中英

Possible to do this in better than O(n^2) time?

The problem I'm trying to solve gives me a matrix like

10101
11100
11010
00101

where the rows are supposed to represented topics that a person knows; eg Person 1, represented by 10101 , knows topics 1, 3 and 5, but not 2 or 4. I need to find the maximum number of topics that a 2-person team could know; eg the team that is Person 1 and 3 knows all the topics because between 10101 and 11010 there are 1 s at every index.

I have an O(n^2) solution

    string[] topic = new string[n];
    for(int topic_i = 0; topic_i < n; topic_i++)
    {
       topic[topic_i] = Console.ReadLine();   
    }
    IEnumerable<int> teamTopics = 
        from t1 in topic
        from t2 in topic
        where !Object.ReferenceEquals(t1, t2)
        select t1.Zip(t2, (c1, c2) => c1 == '1' || c2 == '1').Sum(b => b ? 1 : 0);
    int max = teamTopics.Max();
    Console.WriteLine(max);

which is passing all the test cases it doesn't time out on. I suspect the reason it's not fast enough has to do with the time complexity rather than the overhead of the LINQ machinery. But I can't think of a better way to do it.

I thought that maybe I could map the indices of topics to the persons who know them, like

1 -> {1,2,3}
2 -> {2,3}
3 -> {1,2,4}
4 -> {3}
5 -> {1,4}

but I can't think of where to go from there.

Can you supply me with a "hint"?

Let's say we have n people and m topics.

I would argue that your algorithm is O(n^2 * m), where n is number of people, because:

  1. from t1 in topic gets you O(n)
  2. from t2 in topic gets you to O(n^2)
  3. t1.Zip(t2 ... get you to O(n^2 * m)

An optimisation that I see is first to modify strings a bit:

  • s1 = '0101', where i-th element shows whether a person i knows 1st topic
  • s2 = '1111', where i-th element shows whether a person i knows 2nd topic.

etc...

Then you analyse string s1. You pick all possible pairs of 1s ( O(n^2) elements) that show pairs of people that together know 1st topic. Then go pick a pair from that list and check whether they know 2nd topic as well and so on. When they don't, delete it from the list and move on to another pair.

Unfortunately this looks to be O(n^2 * m) as well, but this should be quicker in practise. For very sparse matrix, it should be close to O(n 2 ), and for dense matrices it should find a pair pretty soon.

Thoughts:

  • as a speculative optimization: you could do an O(n) sweep to find the individual with the highest number of skills (largest hamming weight); note them, and stop if they have everything: pair them with anyone, it doesn't matter
  • you can exclude anyone without testing who only has skilled shared with the "best" individual - we already know about everything they can offer and have tested against everyone; so only test if (newSkills & ~bestSkills) != 0 - meaning: the person being tested has something that the "best" worker didn't have; this leaves m workers with complementary skills plus the "best" worker (you must include them explicitly, as the ~ / !=0 test above will fail for them)
  • now do another O(m) sweep of possible partners - checking to see if the "most skilled" plus any other gives you all the skills (obviously stop earlier if a single member has all the skills); but either way: keep track of best combination for later reference
  • you can further half the time by only considering the triangle, not the square - meaning: you compare row 0 to rows 1 - (m-1) , but row 1 to rows 2 - (m-1) , row 5 to 6 - (m-1) , etc
  • you can significantly improve things by using integer bit math along with an efficient "hamming weight" algorithm (to count the set bits) rather than strings and summing
  • get rid of the LINQ
  • short-circuit if you get all ones (compare to ~((~0)<<k) , where k is the number of bits being tested for)
  • remember to compare any result to the "best" combination we found against the most skilled worker

This is still O(n) + O(m^2) where m <= n is the number of people with skills different to the most skilled worker


Pathological but technically correct answer:

insert a Thread.Sleep(FourYears) - all solutions are now essentially O(1)

Your solution is asymptotically as efficient as it gets, because you need to examine all pairs to arrive at the maximum. You can make your code more efficient by replacing strings with BitArray objects, like this:

var topic = new List<BitArray>();
string line;
while ((line = Console.ReadLine()) != null) {
    topic.Add(new BitArray(line.Select(c => c=='1').ToArray()));
}
var res =
   (from t1 in topic
    from t2 in topic
    select t1.Or(t2).Count).Max();
Console.WriteLine(res);

Demo.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM