简体   繁体   中英

Efficiently find closest pairs in two vectors?

Given two sorted vectors of type double, where the vectors may be of different sizes, I'd like to generate a list of pairs, one element from each of the two vectors, where the difference between the elements in a pair is minimized, and no two pairs share an element. The vectors are both rather large, and the task must be accomplished in a very short amount of time.

I've tried using a binary search (see below), terminating in a comparison of neighboring elements to determine the "closest" match, however it is not efficient enough to complete the task in the required time frame.

Interpolation search takes just as long. Using std::lower_bound() in some algorithm vastly speeds up the code, however it doesn't consider elements less than the search value.

Is there a nice way to do this?

double binarySearch(vector<double> vec, double val) {
       int left = 0; 
       int right = vec.size();
    
       while (left <= right) {
              int mid = (left+right)/2;

              if (vec[mid] == val)
                  return mid;
              else if (vec[mid] < val)
                       left = mid + 1;
              else 
                       right = mid - 1;
       }

       return minimum(vec[mid], vec[mid+1], vec[mid-1]);
}

Hope this is what do you mean:

#include <algorithm>
#include <cstddef>
#include <iostream>
#include <limits>
#include <utility>
#include <vector>

template<class T>
std::vector<std::pair<T, T>> getClosestPairs(std::vector<T> v1, std::vector<T> v2) {
    std::vector<std::pair<T, T>> vPair;
    std::pair<size_t, size_t> indexs;
    std::pair<T, T> close;

    size_t i = 0, j = 0;
    T minDiff = std::numeric_limits<T>::max();

    while(v1.size() != 0 && v2.size() != 0) {
        while(i < v1.size() && j < v2.size()) {
            T diff = v1[i] < v2[j] ? v2[j] - v1[i] : v1[i] - v2[j];
            if(diff < minDiff) {
                minDiff = diff;
                // save index to delete them
                indexs = {i, j};
                // save the closest pair
                close = {v1[i], v2[j]};
            } else { // reached to min no need to move on res the cells
                break;
            }

            // Move the smaller vector's index forward
            if(v1[i] < v2[j]) {
                i++;
            } else {
                j++;
            }
        }
        vPair.push_back(close);
        v1.erase(v1.begin() + indexs.first);
        v2.erase(v2.begin() + indexs.second);
        i = j = 0;
        minDiff = std::numeric_limits<T>::max();
    }
    return vPair;
}
int main() {
    std::vector<double> v1 = {1, 4, 5, 7, 8, 13, 49};
    std::vector<double> v2 = {7, 10, 11, 15, 40};
    std::vector<std::pair<double, double>> result = getClosestPairs(v1, v2);
    for(auto [a, b] : result) std::cout << a << ',' << b << '\n';
}

Output:

7,7
8,10
13,11
5,15
49,40

The best algorithm seems to me as:

  1. iterate using two indexers (incrementing the indexer that points to the lower number)
  2. if the incremented indexer change => you are at local minimum Check if its lower than the best result and either:

2.1 already have a better minimum -> disregard it
2.2 another occurrence of the best delta -> record it
2.3 better then our best delta -> clear results and record new minimum

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM