在未排序的對數組中找到K個UNIQUE最大元素

Question

所以這是場景。 我有一個未分類的數組（非常大），稱為Gallery，其中包含成對的模板（ std::vector<uint8_t> ）及其關聯的ID（ std::string ）。

我有一個向我提供模板的函數，並且必須返回畫廊中k最相似的模板的ID（我使用余弦相似度在模板之間生成相似度得分）。

我認為使用在討論堆這篇文章。 但是，問題在於圖庫可以包含屬於一個ID的多個不同模板。 在我的函數中，我必須返回k 唯一 ID。

對於上下文，我正在做一個面部識別應用程序。 我可以在畫廊中擁有屬於一個人的多個不同模板（該人使用多個不同的圖像注冊了畫廊，因此，多個模板屬於他們的ID）。 搜索功能應將k最相似的人返回給提供的模板（因此，一次返回相同的ID不得超過一次）。

將贊賞在C ++中執行此操作的高效算法。

編輯：為我建議的堆解決方案剪斷了代碼（無法正確處理重復項）

    std::priority_queue<std::pair<double, std::string>, std::vector<std::pair<double, std::string> >, std::greater<> > queue;


    for(const auto& templPair : m_gallery) {
        try{
            double similairty = computeSimilarityScore(templPair.templ, idTemplateDeserial);

            if (queue.size() < candidateListLength) {
                queue.push(std::pair<double, std::string>(similairty, templPair.id));
            } else if (queue.top().first < similairty) {
                queue.pop();
                queue.push(std::pair<double, std::string>(similairty, templPair.id));
            }
        } catch(...) {
            std::cout << "Unable to compute similarity\n";
            continue;
        }
    }
// CandidateListLength number of IDs with the highest scores will be in queue

這是一個希望對您有所幫助的示例。 為了簡單起見，我將假定已經為模板計算了相似性得分。

模板1：相似分數：0.4，ID：賽勒斯

范本2：相似分數：0.5，編號：James

模板3：相似度得分：0.9，ID：鮑勃

模板4：相似度得分：0.8，ID：賽勒斯

模板5：相似度得分：0.7，ID：凡妮莎

模板6：相似度得分：0.3，ID：Ariana

獲取前3個得分模板的ID將返回[Bob，Cyrus，Vanessa]

Answer 1

使用std :: set結構（平衡的BST）而不是堆。 它還可以按順序排列元素，讓您找到插入的最大和最小元素。 此外，使用插入功能時，它會自動檢測到重復項並將其忽略，因此內部的每個元素將始終是唯一的。 復雜度是完全一樣的（盡管由於常數較大，所以速度稍慢）。

編輯：我可能不正確理解該問題。 據我所知，您可以具有多個具有不同值的元素，這些元素應被視為重復項。

我會怎么做：

成對設置（模板值，ID）
制作一個映射，其中key是ID，value是該集中當前模板的模板值。
如果要添加新模板：
- 如果它的ID在地圖中-您已找到一個重復項。 如果其值比映射中與ID配對的值差，則不執行任何操作，否則從集中刪除一對（舊值，ID）並插入（新值，ID），將映射中的值更改為新值。
- 如果不在地圖中，只需將其添加到地圖中並設置即可。
當集合中的項目過多時，只需從集合和地圖中刪除最差的一項即可。

Answer 2

實施了Maras的答案綱要。 似乎可以完成工作。

#include <iostream>
#include <vector>
#include <map>
#include <utility>
#include <string>
#include <set>

int main() {
    int K = 3;

    std::vector<std::pair<double, std::string>> data {
        {0.4, "Cyrus"},
        {0.5, "James"},
        {0.9, "Bob"},
        {0.8, "Cyrus"},
        {0.7, "Vanessa"},
        {0.3, "Ariana"},
    };

    std::set<std::pair<double, std::string>> mySet;
    std::map<std::string, double> myMap;

    for (const auto& pair: data) {
        if (myMap.find( pair.second ) == myMap.end()) {
            // The ID is unique
            if (mySet.size() < K) {
                // The size of the set is less than the size of search candidates
                // Add the result to the map and the set
                mySet.insert(pair);
                myMap[pair.second] = pair.first;
            } else {
                // Check to see if the current score is larger than the worst performer in the set
                auto worstPairPtr = mySet.begin();

                if (pair.first > (*worstPairPtr).first) {
                    // The contender performed better than the worst in the set
                    // Remove the worst item from the set, and add the contender
                    // Remove the corresponding item from the map, and add the new contender
                    mySet.erase(worstPairPtr);
                    myMap.erase((*worstPairPtr).second);
                    mySet.insert(pair);
                    myMap[pair.second] = pair.first;
                }
            }

        } else {
            // The ID already exists
            // Compare the contender score to the score of the existing ID.
            // If the contender score is better, replace the existing item score with the new score
            // Remove the old item from the set
            if (pair.first > myMap[pair.second]) {
                mySet.erase({myMap[pair.second], pair.second});
                mySet.insert(pair);
                myMap[pair.second] = pair.first;
            }

        }
    }

    for (auto it = mySet.rbegin(); it != mySet.rend(); ++it) {
        std::cout << (*it).second << std::endl;
    }

}

輸出是

Bob
Cyrus
Vanessa

在未排序的對數組中找到K個UNIQUE最大元素

問題描述

2 個解決方案

解決方案1
2 2019-09-13 23:51:19

解決方案2
1 已采納 2019-09-18 21:08:07

在未排序的對數組中找到K個UNIQUE最大元素

問題描述

2 個解決方案

解決方案1 2 2019-09-13 23:51:19

解決方案2 1 已采納 2019-09-18 21:08:07

解決方案1
2 2019-09-13 23:51:19

解決方案2
1 已采納 2019-09-18 21:08:07