Given two lists where one is scaled by some factored alpha
:
from random import randint
alpha = 1.2
x = [randint(1, 100) for x in range(1000)]
y = [int(alpha * i) for i in x]
I want to filter both lists for values under some threshold whereby the delta in the number of elements returned in both list is minimized. So if my threshold for x
is 40 then len([i for i in x if i < 40]) ~ 400
I want to know what the threshold value should be for y
when alpha
is unknown so the number of element return is ~ 400, ie 48 for this example.
You can calculate an average alpha
as:
alpha = sum((yn / float(xn)) for xn, yn in zip(x, y)) / len(x)
then:
y_threshold = int(alpha * x_threshold)
If minimising abs(len(filtered_x) - len(filtered_y))
is crucial, you could then carry out a local search around y_threshold
.
Your x_threshold
allows you to know how many x
's are below it (here 400
). So you just need to find the element in y
that ranks as the 400th and use it as y_threshold
.
You do that by sorting y
(which is overkill) or by selection of the Nth (can be done in O(N)
). This approach always achieves delta=0
.
For a simpler and approximate solution, estimate alpha
as the ratio of the sums of both lists and set y_threshold = alpha . x_threshold
y_threshold = alpha . x_threshold
. (The least-squares estimator of alpha
, Sum(yx)/Sum(x^2)
, or the ratio of standard deviations, may be preferred)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.