简体   繁体   中英

Decimals as keys for unordered_map

I've often heard that you're not supposed to use float values as keys in an unordered_map or hash tables. For my use case, I have float values that are guaranteed to only be up to the hundredths in decimal value. As these decimals come in, I would like to count the number of times they appear. I only need ~90% accuracy in counting. Would using a hash table to store counts be viable, or would this run into too many performance issues?

From what I understand, you want to create an histogram. Using a map for a histogram is a waste of resources.

A simple array, or vector, is more than enough, will work fine to this task, and will always be more efficient than a map.

constexpr const float hist_bin_width = .01f; // change to wanted resolution.

// because of IEEE floating point limitations: 
//   max resolution is (hist_max - hist_min) / (1 << 24).
// which will give you 16 Mega bins (1 << 24) plus or minus 1. 
// but you shouldn't need that high of a resolution.

// So we can use multiplication in code below for performance
constexpr const float hist_inv_bin_width = 1.f / hist_bin_width;   

constexpr const float hist_min  = -1.f;       // lower limit
constexpr const float hist_max  =  1.f;       // upper limit

size_t samples_count = 0;     // total number of vallues tallied.
// used to count out-of-bounds values.
size_t below_min_count = 0;
size_t above_max_count = 0;

 // useful, note that some compilers may not vhave a constexpr floorf() 
inline constexpr int value_to_index(float value) noexcept
{
    return int(floorf(((value - hist_min) * hist_inv_bin_width) + .5f));
}

inline constexpr float index_to_value(int i) noexcept
{
    return (i * hist_bin_with) + hist_min; 
}

void Tally(std::vector<size_t>& histogram, float value) noexcept
{
    ++samples_count;

    int i = value_to_index(value);
    if (i < 0)
        ++below_min_count;
    else if (i >= histogram.size())
        ++above_max_count;
    else
        ++histogram[i];
}
// ..

int main()
{
    std::vector<size_t> histogram;
    histogram.resize(1 + value_to_index(hist_max));
    
    std::vector<float> input_values;

    //...

    for (auto val : input_values)
        Tally(histogram, val);
   
    // ...
    
    return 0;
}

If you are worried about the size and range of you histogram, consider this: 32-bit floats have 24 data bits in the mantissa, they can represent only 1<<24 = 16777218 distinct values at a resolution of .01. Other values will either be smaller than .01, or so large that the gap between two consecutve binary values will be larger than .01. This rule of thumb is precise enough for all resolutions. Note: it is an approximation only because (1/.01 =) 100 is not an exact power of 2, but it still is quite precise.

Any histogram can only be that large before being mathematically useless at any resolution. The hard limit of the size of a useful histogram of 32 bit values (using vector<uint32_t> instead of vector<size_t>) to around 64 MB, using size_t which is 64-bit, the largest mathematically useful histogram at any resolution would take around 128 MB of RAM. That's not very much for modern computers, it's not even that large for a smart phone or a raspberry PI.

The maximum useful range for your input values, and thus the max useful range for your histogram would be the interval [-(1<<23)x10^-2, +(1<<23)x10^-2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM