最近鄰居在 Python scikit-learn 上使用自定義權重

Question

晚安，

我想使用最近鄰 model 進行權重不均勻的回歸。 我在用戶指南中看到我可以在 model 的聲明中使用weights='distance' ，然后權重將與距離成反比，但我得到的結果不是我想要的。

我在文檔中看到我可以使用 function 作為預測中使用的權重（給定距離），所以我創建了以下 function：

from sklearn.neighbors import KNeighborsRegressor
import numpy
nparray = numpy.array

def customized_weights(distances: nparray)->nparray:
    for distance in distances:
        if (distance >= 100 or distance <= -100):
            yield  0

        yield (1 - abs(distance)/100)

並聲明了這樣的方法：

knn: KNeighborsRegressor = KNeighborsRegressor(n_neighbors=50, weights=customized_weights ).fit(X_train, y_train)

在那之前，一切正常。 但是當我嘗試使用 model 進行預測時，我得到了錯誤：

  File "knn_with_weights.py", line 14, in customized_weights
    if (distance >= 100 or distance <= -100):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我不明白我做錯了什么。 在文檔中寫道，我的 function 應該有一個距離數組作為參數，並且應該返回等效權重。 我做錯了什么？

提前致謝。

Answer 1

我對這種類型的回歸知之甚少，但傳遞給它的距離肯定有可能是一個二維數據結構，這對所有成對距離都是有意義的。

你為什么不在你的自定義 function 中放一個小預告打印語句來打印distances和distances.shape .shape

Answer 2

@Jeff H的提示將我引向了答案。

這個 function 的輸入參數是一個二維 numpy 數組distances與形狀(predictions, neighbors) ，其中：

predictions 是所需預測的數量（當您調用knn.predict(X_1, X_2, X_3, ...) ；
鄰居，使用的鄰居數（在我的例子中，n_neighbors=50）。

每個元素distances[i, j]表示第i個預測的距離，距最近的j個最近鄰居（ j越小，距離越小）。

function 必須返回一個與輸入數組具有相同維度的數組，權重對應於每個距離。

我不知道這是否是最快的方法，但我想出了這個解決方案：

def customized_weights(distances: nparray)->nparray:

    weights: nparray = nparray(numpy.full(distances.shape, 0), dtype='float')
# create a new array 'weights' with the same dimension of  'distances' and fill 
# the array with 0 element.
    for i in range(distances.shape[0]): # for each prediction:
        if distances[i, 0] >= 100: # if the smaller distance is greather than 100, 
                                   # consider the nearest neighbor's weight as 1 
                                   # and the neighbor weights will stay zero
            weights[i, 0] = 1
                                   # than continue to the next prediction
            continue

        for j in range(distances.shape[1]): # aply the weight function for each distance

            if (distances[i, j] >= 100):
                continue

            weights[i, j] = 1 - distances[i, j]/100

    return weights

最近鄰居在 Python scikit-learn 上使用自定義權重

問題描述

2 個解決方案

解決方案1
0 2020-07-11 18:03:20

解決方案2
0 已采納 2020-07-11 22:12:05

最近鄰居在 Python scikit-learn 上使用自定義權重

問題描述

2 個解決方案

解決方案1 0 2020-07-11 18:03:20

解決方案2 0 已采納 2020-07-11 22:12:05

解決方案1
0 2020-07-11 18:03:20

解決方案2
0 已采納 2020-07-11 22:12:05