简体   繁体   中英

Sorting by best combinations of two

I have list of "products" where everyone has a two features, for example: price and rating for books, or time and price for tickets, etc
Books (10, 15) where 10 is the price in dollars (the cheaper is better) and 15 is the rating from 0 to 100 (more is better).

L = [(150, 100), (50, 15), (20, 70), (10, 40), (76, 30)]

The list should be sorted by a best combination of price and rating.
I have 2 solutions till now, the best one is to find "weight" for every pair by multiplying price*(1/rating) and sorting by this "weight", the less is better.

res1 = {}
for i in L:
    res1[i] = i[0]*(1./i[1])
# {(10, 40): 0.25, (20, 70): 0.2857, (50, 15): 3.3333, (76, 30): 2.5333, (150, 100): 1.5}
sorted(res1, key=lambda x: res1[x])
# [(10, 40), (20, 70), (150, 100), (76, 30), (50, 15)]

The second solution is more complex and less representative. It requires to sort two times - by price and by rating (rating - reversed) and trying to find the matches. The "weight" will be index in first sorted list multiplied (or summed) by index in second sorted list.

L1 = sorted(L, key=lambda x: x[0])
L2 = sorted(L, key=lambda x: x[1], reverse=True)
res = {}
for i in L:
    res[i] = (L1.index(i)+1) * (L2.index(i)+1)
res
# {(10, 40): 3, (20, 70): 4, (50, 15): 15, (76, 30): 16, (150, 100): 5}
sorted(res, key=lambda x: res[x])
# [(10, 40), (20, 70), (150, 100), (50, 15), (76, 30)]

When using the second variant with a lot of data it shows less representative results.
But I'm tired of inventing a wheel, so which math and algorithmic solutions you can suggest? Interesting if this question has solution when there 3 and more features: price, supplying time, weight, rating, etc.

Update: Thanks to @georgesl for pointing this out. How could I deal with outliers, for example a very very bad book, but it's very cheap?? I think they should be treated somehow differently.

Why your not combine your answers like that

L = [(150, 100), (50, 15), (20, 70), (10, 40), (76, 30)]
sorted(L, key=lambda x: x[0] / (x[1] * 1.0))
# [(10, 40), (20, 70), (150, 100), (76, 30), (50, 15)]

PS If you want get float answer the best way multiply it on 1.0 number. It's works faster than divide or run your number in float

Your goal is to order your products according to the "best combination" of price and rating. You have considered two algorithms, and you report that the first seems to work better. What you don't tell us, and you probably don't have, is a way to measure which orders are best. So nobody can suggest a better method, because we don't know what you're going to like. How important is quality (rating) to you? You might care about it more, or less, than I do. In short: You need either an independent metric of goodness of order (eg, based on the number of people who actually buy a product), or a training set that you have manually ordered the way you want to see them.

Supposing you have a training set, you can try different ranking and measure how close they come to the ordering you like (on the training data, at least; but you'll hope that the algorithm generalizes to other data). One way to measure that is with a rank correlation statistic.

There's a whole family of solutions that are linear combinations of your features: a * price + b * rating , where a is probably negative since low price is good. The bigger the b , the more important the quality rating is. You can set a and b to give you the optimal ranking. Or you can "fit" a more complex model, eg involving squares or ratios. All you need is a way to measure the goodness of the resulting ordering.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM