[英]Finding Median of more than 20 Million of 3 to 4 different integers in 1.5 seconds
我試圖排序並找到一個只包含3到4個不同整數的整數字符串的中位數。
我正在處理的數字量大約為2千萬到2千5百萬,我應該對向量進行排序,每次將新整數添加到向量中時找到中位數,並將中位數添加到單獨的“總計”變量中每次生成中位數時,它會匯總所有中位數。
1 Median: 1 Total: 1
1 , 2 Median: (1+2)/2 = 1 Total: 1 + 1 = 2
1 , 2 , 3 Median: 2 Total: 2 + 2 = 4
1 , 1 , 2 , 3 Median: (1+2)/2 = 1 Total: 4 + 1 = 5
1 , 1 , 1 , 2 , 3 Median: 1 Total: 5 + 1 = 6
我試圖找到一種方法來進一步優化我的代碼,因為它不夠高效。 (必須在2s左右處理)有沒有人知道如何進一步加快我的代碼邏輯?
我目前在C ++中使用2個堆或優先級隊列。 一個用作最大堆,另一個用作最小堆。
從數據結構中找到了尋找中位數的想法
You can use 2 heaps, that we will call Left and Right.
Left is a Max-Heap.
Right is a Min-Heap.
Insertion is done like this:
If the new element x is smaller than the root of Left then we insert x to
Left.
Else we insert x to Right.
If after insertion Left has count of elements that is greater than 1 from
the count of elements of Right, then we call Extract-Max on Left and insert
it to Right.
Else if after insertion Right has count of elements that is greater than the
count of elements of Left, then we call Extract-Min on Right and insert it
to Left.
The median is always the root of Left.
So insertion is done in O(lg n) time and getting the median is done in O(1)
time.
但它還不夠快......
如果字符串中只有三到四個不同的整數,則可以通過遍歷字符串一次來跟蹤每個整數出現的次數。 從這種表示中添加(和刪除元素)也是可以在恆定時間內完成的。
class MedianFinder
{
public:
MedianFinder(const std::vector<int>& inputString)
{
for (int element : inputString)
_counts[element]++; // Inserts 0 into map if element is not in there.
}
void addStringEntry(int entry)
{
_counts[entry]++;
}
int getMedian() const
{
size_t numberOfElements = 0;
for (auto kvp : _counts)
numberOfElements += kvp.second;
size_t cumulativeCount = 0;
int lastValueBeforeMedian;
for (auto kvp : _counts)
{
cumulativeCount += kvp.second;
if (cumulativeCount >= numberOfElements/2)
lastValueBeforeMedian = kvp.first;
}
// TODO! Handle the case of the median being in between two buckets.
//return ...
}
private:
std::map<int, size_t> _counts;
};
這里沒有顯示總結中位數的微不足道的任務。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.