I'm looking for an algorithm to rank objects. Two objects can be compared. However, the comparisons are real world comparisons that may be flawed. Also, I care more about finding out the very best object than which ones are the worst.
Think that I'm scientifically evaluating materials. I combine two materials. I want to find the best working material for in-depth testing. So, I don't care about materials that are unpromising. However, each test can be a false positive or have anomalies between those particular two materials.
Imagine that you are tasked with hiring a team of boxers. You know nothing about evaluating boxers but can ask two boxers to fight each other. There is an unlimited number of boxers in the world. But it's expensive to fly them in. Ideally, you want to hire the n best boxers. Realistically, you don't know if the boxers are going to accept your offer. Plus, you don't know how competitively the other boxing clubs bid. You are going to make offers to only the best n boxers, but have to be prepared to know which next n boxers to send offers to. That you only get the worst boxers is very unlikely.
I could think of the following approaches. However, they all have drawbacks. I feel like there should be a much better approach.
Traditional sorting algorithms could be used.
Drawback: - A false positive could serious throw of the correctness of the algorithm. - A sorting algorithm would spend half the time sorting the bottom half of the pack, which is unimportant. - Sorting algorithms start with all items. With this problem, we are allowed to do the first test, not knowing if we are allowed to do a second test. We may end up only being allowed to do two test. Or we may be allowed to do a million tests.
Drawback: - This seems pretty promising. The difficulty is to find one that allows adding one more player at a time as we are allowed more comparisons. It seems that there should be a highly specialized solution that's better than a standard tournament algorithm.
Drawback: - The algorithm doesn't correct for false positives. If there is a false positive at the top early on, it could skew the whole rest of the tests.
DRAWBACK: - The approach is very nice in that it corrects for false positives. It also allows adding more objects to the test pool easily. However, it does not consider that a win against a top object counts a lot more than a win against a bottom object. Thus, comparisons are wasted.
DRAWBACK: - I don't know how to flatten such a messy graph that could have cycles and ambiguous end nodes. There could be multiple objects that are undefeated. How would one pick a winner in such a messy graph? How would one know which comparison would be the most valuable?
DRAWBACK - The approach seems promising in that it is easy to add new objects to the pool of tested objects. It also takes into account that wins against top objects should count for more. I can't think of a good way to determine the points. That first comparison, was awarded 1 point. Once 10,000 objects are in the pool, an average win would be worth 5,000 points. The award of both tests should be roughly equal. Later comparisons overpower the earlier comparisons and make them be ignored when they shouldn't.
Does anyone have a good idea on tackling this problem?
I would search for an easily computable value for an object, that could be compared between objects to give a good enough approximation of order. You could compare each new object with the current best accurately, then insertion sort the loser into a list of the rest using its computed value.
The best will always be accurate. The ordering of the rest depending on your "value".
I would suggest looking into Elo Rating systems and its derivatives. (like Glicko, BayesElo, WHR, TrueSkill etc.)
So you assign each object a preliminary rating, and then update that value according to the matches/comparisons you make. (with bigger changes to the ratings the more unexpected the outcome was)
This still leaves open the question of how to decide which object to compare to which other object to gain most information. For that I suggest looking into tournament systems and playoff formats. Though I suspect that an optimal solution will be decidedly more ad-hoc than that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.