简体   繁体   English

在已排序的STL容器中查找给定键的“最佳匹配键”

[英]Finding “best matching key” for a given key in a sorted STL container

Problem 问题

I have timestamped data, which I need to search based on the timestamp in order to get the one existing timestamp which matches my input timestamp the closest. 我有时间戳数据,我需要根据时间戳进行搜索,以获得与我最接近的输入时间戳匹配的现有时间戳。
Preferably this should be solved with the STL. 优选地,这应该用STL解决。 boost::* or stl::tr1::* (from VS9 with Featurepack) are also possible. boost :: *或stl :: tr1 :: *(来自带有Featurepack的VS9)也是可能的。
Example of timestamped data: 带时间戳的数据示例:

struct STimestampedData
{
 time_t m_timestamp; // Sorting criterion
 CData m_data;       // Payload
}

Approach with stl::vector , sort() and equal_range() 使用stl::vectorsort()equal_range()

Since a map or set only allows me to find exact matches, I don't get any further using one of these. 由于mapset只允许我找到完全匹配,因此我不会再使用其中任何一个。 So now I have a vector to which I append data as it is coming in. Before searching I use <algorithm> 's sort() and supply it with a custom comparison function. 所以现在我有一个vector附加数据的vector 。在搜索之前我使用<algorithm>sort()并为它提供自定义比较函数。
After that I use <algorithm> 's equal_range() to find the two neighbors of a specified value x . 之后我使用<algorithm>equal_range()来查找指定值x的两个邻居。 From these two values I check which one is closest to x and then I have my best match. 从这两个值我检查哪一个最接近x然后我有我最好的匹配。


While this is not too complex, I wonder if there are more elegant solutions to this. 虽然这不是太复杂,但我想知道是否有更优雅的解决方案。
Maybe the STL already has an algorithm which does exactly that so I'm not re-inventing something here? 也许STL已经有了一个完全正确的算法,所以我不会在这里重新发明一些东西?

Update: Linear vs. binary search 更新:线性与二进制搜索

I forgot to mention that I have quite a lot of data to handle so I don't want to have to search linearly. 我忘了提到我有很多数据要处理,所以我不想要线性搜索。
The reason I am sorting a vector with sort() is because it has random access iterators which is not the case with a map . 我使用sort()对向量进行sort()的原因是因为它具有随机访问迭代器,而不是map的情况。 Using a map would not allow equal_range() to do a search with twice logarithmic complexity. 使用map不允许equal_range()以两倍的对数复杂度进行搜索。
Am I correct? 我对么?

I would use equal_range too for such a thing. 对于这样的事情,我也会使用equal_range。

If you are using sort() every time on your vector it might be better to use a map (or set), as that's always sorted automatically, and use the member equal_range 如果你每次在vector上使用sort(),最好使用map(或set),因为它总是自动排序,并使用成员equal_range

But that depends on the the amount of inserts / queries / amount of data. 但这取决于插入/查询/数据量。 (although for something that always needs to be sorted when I query, a map would be my first choice, and I'd only use a vector if there was a very good reason) (虽然在我查询时总是需要排序的东西,地图将是我的第一选择,如果有一个很好的理由我只会使用矢量)

I would use set::lower_bound to find the matching or greater value, then decrement the iterator to check the next lower value. 我会使用set :: lower_bound来查找匹配或更大的值,然后递减迭代器以检查下一个较低的值。 You should use std::set rather than std::map since your key is embedded in the object - you'll need to provide a functor that compares the timestamp members. 您应该使用std :: set而不是std :: map,因为您的密钥嵌入在对象中 - 您需要提供一个比较时间戳成员的仿函数。

struct TimestampCompare
{
    bool operator()(const STimestampedData & left, const STimestampedData & right) const
    {
        return left.m_timestamp < right.m_timestamp;
    }
};
typedef std::set<STimestampedData,TimestampCompare> TimestampedDataSet;

TimestampedDataSet::iterator FindClosest(TimestampedDataSet & data, STimestampedData & searchkey)
{
    if (data.empty())
        return data.end();
    TimestampedDataSet::iterator upper = data.lower_bound(searchkey);
    if (upper == data.end())
        return --upper;
    if (upper == data.begin() || upper->m_timestamp == searchkey.m_timestamp)
        return upper;
    TimestampedDataSet::iterator lower = upper;
    --lower;
    if ((searchkey.m_timestamp - lower->m_timestamp) < (upper->m_timestamp - searchkey.m_timestamp))
        return lower;
    return upper;
}

Depending on what your usage is, you could do a simple linear search instead of a sort. 根据您的使用情况,您可以进行简单的线性搜索而不是排序。 Come up with a "distance" function, loop through keeping track of the best match so far, and its distance. 提出一个“距离”功能,循环跟踪目前为止的最佳匹配及其距离。 When you find a better match, forget the previous one, and keep the new one and its distance. 当你找到更好的匹配时,忘记前一个,并保持新的和它的距离。 When you've looped through everything, you have your match. 当你完成所有事情时,你就得到了你的匹配。

This works out to be O(N*S) where N is the number of items in the vector and S is the number of searches. 这可以是O(N * S),其中N是向量中的项目数,S是搜索数。

Your current way is O((N+S)*LogN) which is greater if the number of searches is small and bounded. 您当前的方式是O((N + S)* LogN),如果搜索的数量很小且有界,则更大。 Otherwise the sort / binary search is better. 否则排序/二进制搜索更好。

//the function should return the element from iArr which has the least distance from input
double nearestValue(vector<double> iArr, double input)
{
    double pivot(0),temp(0),index(0);
    pivot = abs(iArr[0]-input);
    for(int m=1;m<iArr.size();m++)
    {           
        temp = abs(iArr[m]-input);

        if(temp<pivot)
        {
            pivot = temp;
            index = m;
        }
    }

    return iArr[index];
}

void main()
{
    vector<double> iArr;

    srand(time(NULL));
    for(int m=0;m<10;m++)
    {
        iArr.push_back(rand()%20);
        cout<<iArr[m]<<" ";
    }

    cout<<"\nnearest value is: "<<lib.nearestValue(iArr,16)<<"\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM