I have a n-dimensional vector and I want to find its k nearest neighbors in a list of n-dimensional vectors using euclidian distance.
I wrote the following code (with k=10) which works but runs too slowly and I was wondering if there was a more optimal solution.
def nearest_neighbors(value, array, nbr_neighbors=1):
return np.argsort(np.array([np.linalg.norm(value-x) for x in array]))[:nbr_neighbors]
Use scipy's kd-tree .
A small example is available here .
Many people seem to complain about the performance and recommend sklearn's implementation though (links sklearn.neighbors, which is using this data-structure internally)!
As sascha said, I ended up using the scipy library (but the NearestNeighbors
method) which brought down the computation time from 50 hours to 36 minutes. It is the kind of computation I should not have tried to reimplement myself as dedicated libraries are much more optimized for this.
The NearestNeighbors
method also allows you to pass in a list of values and returns the k nearest neighbors for each value.
Final code was:
def nearest_neighbors(values, all_values, nbr_neighbors=10):
nn = NearestNeighbors(nbr_neighbors, metric='cosine', algorithm='brute').fit(all_values)
dists, idxs = nn.kneighbors(values)
I would try using the pdist function of scipy to find the pairwise distances by brute force: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
It should be quite fast as pdist is highly optimized. Then for each element pick the k nearest.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.