简体   繁体   中英

Minimize delta between two lists

Given two lists where one is scaled by some factored alpha :

from random import randint

alpha = 1.2
x = [randint(1, 100) for x in range(1000)]
y = [int(alpha * i) for i in x]

I want to filter both lists for values under some threshold whereby the delta in the number of elements returned in both list is minimized. So if my threshold for x is 40 then len([i for i in x if i < 40]) ~ 400 I want to know what the threshold value should be for y when alpha is unknown so the number of element return is ~ 400, ie 48 for this example.

You can calculate an average alpha as:

alpha = sum((yn / float(xn)) for xn, yn in zip(x, y)) / len(x)

then:

y_threshold = int(alpha * x_threshold)

If minimising abs(len(filtered_x) - len(filtered_y)) is crucial, you could then carry out a local search around y_threshold .

Your x_threshold allows you to know how many x 's are below it (here 400 ). So you just need to find the element in y that ranks as the 400th and use it as y_threshold .

You do that by sorting y (which is overkill) or by selection of the Nth (can be done in O(N) ). This approach always achieves delta=0 .

For a simpler and approximate solution, estimate alpha as the ratio of the sums of both lists and set y_threshold = alpha . x_threshold y_threshold = alpha . x_threshold . (The least-squares estimator of alpha , Sum(yx)/Sum(x^2) , or the ratio of standard deviations, may be preferred)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM