Combining integers and floating point numbers: performance considerations

Question

I have a complex set of template functions which do calculations in a loop, combining floating point numbers and the uint32_t loop indices. I was surprised to observe that for this kind of functions, my test code runs faster with double precision floating point numbers than with single precision ones.

As a test, I changed the format of my indices to uint16_t. After this, both the double and float version of the program were faster (as expected), but now the float version was significantly faster than the double version. I also tested the program with uint64_t indices. In this case the double and the float version are equally slow.

I imagine that this is because an uint32_t fits into the mantissa of a double but not into a float. Once the indices type was reduced to uint16_t, they also fit into the mantissa of a float and a conversion should be trivial. In case of uint64_t, the conversion to double also needs rounding, which would explain why both versions perform equally.

Can anybody confirm this explanation?

EDIT: Using int or long as index type, the program runs as fast as for unit16_t. I guess this speaks against what I suspected first.

EDIT: I compiled the program for windows on an x86 architecture.

EDIT: Here is a piece of code that reproduces the effect of double being faster as float for uint32_t and both cases being equally fast for int. Please do not comment on the usefulness of this code. It is a modified fragment of code reproducing the effect which does nothing sensible.

The main file:

#include "stdafx.h"

typedef short spectraType;
typedef int intermediateValue;
typedef double returnType;

#include "Preprocess_t.h"
#include "Windows.h"
#include <iostream>

int main()
{
    const size_t numberOfBins = 10000;
    const size_t numberOfSpectra = 500;
    const size_t peakWidth = 25;

    bool startPeak = false;
    short peakHeight;

    Preprocess<short, returnType> myPreprocessor;
    std::vector<returnType> processedSpectrum;

    std::vector<std::vector<short>> spectra(numberOfSpectra, std::vector<short>(numberOfBins));


    std::vector<float> peakShape(peakWidth);

    LARGE_INTEGER freq, start, stop;
    double time_ms;

    QueryPerformanceFrequency(&freq);


    for (size_t i = 0; i < peakWidth; ++i)
    {
        peakShape[i] = static_cast<float>(exp(-(i - peakWidth / 2.0) *(i - peakWidth / 2.0) / 10.0));
    }


    for (size_t i = 0; i < numberOfSpectra; ++i)
    {
        size_t j = 0;
        for (; j < 200; ++j)
        {
            spectra[i][j] = rand() % 100;
        }

        for (size_t k = 0; k < 25; ++k)
        {
            spectra[i][j] = static_cast<short>(16383 * peakShape[k]);
            j++;
        }

        for (; j < numberOfBins; ++j)
        {
            startPeak = !static_cast<bool>(abs(rand()) % (numberOfBins / 4));

            if (startPeak)
            {
                peakHeight = rand() % 16384;

                for (size_t k = 0; k < 25 && j< numberOfBins; ++k)
                {
                    spectra[i][j] = peakHeight * peakShape[k] + rand() % 100;
                    j++;
                }
            }
            else
            {
                spectra[i][j] = rand() % 100;
            }
        }

        for (j = 0; j < numberOfBins; ++j)
        {
            double temp = 1000.0*exp(-(static_cast<float>(j) / (numberOfBins / 3.0)))*sin(static_cast<float>(j) / (numberOfBins / 10.0));
            spectra[i][j] -= static_cast<short>(1000.0*exp(-(static_cast<float>(j) / (numberOfBins / 3.0)))*sin(static_cast<float>(j) / (numberOfBins / 10.0)));
        }

    }

    // This is where the critical code is called

    QueryPerformanceCounter(&start);

    for (int i = 0; i < numberOfSpectra; ++i)
    {
        myPreprocessor.SetSpectrum(&spectra[i], 1000, &processedSpectrum);
        myPreprocessor.CorrectBaseline(30, 2.0);
    }

    QueryPerformanceCounter(&stop);

    time_ms = static_cast<double>(stop.QuadPart - start.QuadPart) / static_cast<double>(freq.QuadPart);

    std::cout << "time spend preprocessing: " << time_ms << std::endl;

    std::cin.ignore();

    return 0;
}

And the included header Preprocess_t.h:

#pragma once

#include <vector>

//typedef unsigned int indexType;
typedef unsigned short indexType;

template<typename T, typename Out_Type>
class Preprocess
{
public:
    Preprocess() :threshold(1), sdev(1), laserPeakThreshold(500), a(0), b(0), firstPointUsedAfterLaserPeak(0) {};
    ~Preprocess() {};

    void SetSpectrum(std::vector<T>* input, T laserPeakThreshold, std::vector<Out_Type>* processedSpectrum); ///@note We need the laserPeakThresholdParameter for the baseline correction, not onla for the shift.

    void CorrectBaseline(indexType numberOfPoints, Out_Type thresholdFactor);

private:

    void LinFitValues(indexType beginPoint);
    Out_Type SumOfSquareDiffs(Out_Type x, indexType n);

    Out_Type LinResidualSumOfSquareDist(indexType beginPoint);

    std::vector<T>* input;
    std::vector<Out_Type>* processedSpectrum;

    std::vector<indexType> fitWave_X;
    std::vector<Out_Type> fitWave;
    Out_Type threshold;
    Out_Type sdev;
    T laserPeakThreshold;
    Out_Type a, b;
    indexType firstPointUsedAfterLaserPeak;
    indexType numberOfPoints;
};

template<typename T, typename Out_Type>
void Preprocess<T, Out_Type>::CorrectBaseline(indexType numberOfPoints, Out_Type thresholdFactor)
{
    this->numberOfPoints = numberOfPoints;

    indexType numberOfBins = input->size();
    indexType firstPointUsedAfterLaserPeak = 0;

    indexType positionInFitWave = 0;

    positionInFitWave = firstPointUsedAfterLaserPeak;

    for (indexType i = firstPointUsedAfterLaserPeak; i < numberOfBins - numberOfPoints; i++) {
        LinFitValues(positionInFitWave);
        processedSpectrum->at(i + numberOfPoints) = input->at(i + numberOfPoints) - static_cast<Out_Type>(a + b*(i + numberOfPoints));


        positionInFitWave++;
        fitWave[positionInFitWave + numberOfPoints - 1] = input->at(i + numberOfPoints - 1);
        fitWave_X[positionInFitWave + numberOfPoints - 1] = i + numberOfPoints - 1;

    }
}

template<typename T, typename Out_Type>
void Preprocess<T, Out_Type>::LinFitValues(indexType beginPoint)
{
    Out_Type y_mean, x_mean, SSxy, SSxx, normFactor;
    y_mean = x_mean = SSxy = SSxx = normFactor = static_cast<Out_Type>(0);
    indexType endPoint = beginPoint + numberOfPoints;
    Out_Type temp;

    if ((fitWave_X[endPoint - 1] - fitWave_X[beginPoint]) == numberOfPoints)
    {
        x_mean = (fitWave_X[endPoint - 1] - fitWave_X[beginPoint]) / static_cast<Out_Type>(2);

        for (indexType i = beginPoint; i < endPoint; i++) {
            y_mean += fitWave[i];
        }

        y_mean /= numberOfPoints;

        SSxx = SumOfSquareDiffs(x_mean, fitWave_X[endPoint - 1]) - SumOfSquareDiffs(x_mean, fitWave_X[beginPoint]);

        for (indexType i = beginPoint; i < endPoint; i++)
        {
            SSxy += (fitWave_X[i] - x_mean)*(fitWave[i] - y_mean);
        }
    }
    else
    {
        for (indexType i = beginPoint; i < endPoint; i++) {
            y_mean += fitWave[i];
            x_mean += fitWave_X[i];
        }

        y_mean /= numberOfPoints;
        x_mean /= numberOfPoints;

        for (indexType i = beginPoint; i < endPoint; i++)
        {
            temp = (fitWave_X[i] - x_mean);
            SSxy += temp*(fitWave[i] - y_mean);
            SSxx += temp*temp;
        }
    }

    b = SSxy / SSxx;
    a = y_mean - b*x_mean;
}

template<typename T, typename Out_Type>
inline Out_Type Preprocess<T, Out_Type>::SumOfSquareDiffs(Out_Type x, indexType n)
{
    return n*x*x + n*(n - 1)*x + ((n - 1)*n*(2 * n - 1)) / static_cast<Out_Type>(6);
}

template<typename T, typename Out_Type>
Out_Type Preprocess<T, Out_Type>::LinResidualSumOfSquareDist(indexType beginPoint) 
{
    Out_Type sumOfSquares = 0;
    Out_Type temp;

    for (indexType i = 0; i < numberOfPoints; ++i) {
        temp = fitWave[i + beginPoint] - (a + b*fitWave_X[i + beginPoint]);
        sumOfSquares += temp*temp;
    }
    return sumOfSquares;
}


template<typename T, typename Out_Type>
inline void Preprocess<T, Out_Type>::SetSpectrum(std::vector<T>* input, T laserPeakThreshold, std::vector<Out_Type>* processedSpectrum)
{
    this->input = input;
    fitWave_X.resize(input->size());
    fitWave.resize(input->size());
    this->laserPeakThreshold = laserPeakThreshold;
    this->processedSpectrum = processedSpectrum;
    processedSpectrum->resize(input->size());
}

Answer 1

You are using MSVC? I had a similar effect when I implemented code that essentially was a matrix-multiplication plus a vector addition. Here, I thought that float s would be faster because they can be better SIMD-parallelized as more can be packed in the SSE registers. However, using double s was much faster.

After some investigation, I figured out from the assembler code that the float's need conversion from the internal FPU precision and this rounding was consuming most of the runtime. You can change the FP model to something that is faster with the cost of reduced precision. There is also some discussion in older threads here at SO.

Combining integers and floating point numbers: performance considerations

Question

1 answers

solution1
0 2016-06-22 15:32:08

Combining integers and floating point numbers: performance considerations

Question

1 answers

solution1 0 2016-06-22 15:32:08

solution1
0 2016-06-22 15:32:08