简体   繁体   中英

how do I use least square algorithm to match two data sets through a linear equation in Python

I have two one-dimensional vectors. One contains data measured through a measurement system. The other vector contains a sort of calibration data, which are exactly the same in a 'shape' and a time (it is basically a single pulse and in both vectors those pulses are synced in time domain).

I want to match the calibration data curve to the originally measured data through a simple transformation of original_data = (calibration_data - offset) * gain

I need use 'the best approach' to find the offset and gain parameters such that the two traces look the most similar possible. For that I was thinking that the least-square scalar sum_i( (F(gain,offset)(calibration_i)-measured_i) ** 2 ) has to be minimised for the two data sets. The minimisation would be done by tweaking the gain and offset of the transformation function.

I've implemented a brute-force algorithm of this kind:

    offset = 0
    gain = 1.0
    firstIteration = True
    lastlstsq = 0
    iterations = 0

    for ioffset in np.arange(-32768, 32768, 50):
        for igain in np.arange(1,5,0.1):
            # prepare the trace by transformation:
            int1 = map(lambda c: (c - ioffset) * igain, self.fetcher.yvalues['int1'])

            # this is pretty heavy computation here
            lstsq = sum(map(lambda c: c**2, map(sub, self.fetcher.yvalues['int0'],int1)))
            if firstIteration == True:
                # just store
                lastlstsq = lstsq
                offset = ioffset
                gain = igain
                firstIteration = False
            else:
                # what lstsq:
                if lstsq < lastlstsq:
                    # got better match:
                    lastlstsq = lstsq
                    offset = ioffset
                    gain = igain
                    print "Iteration ", iterations, " squares=", lstsq, " offset=", offset, " gain=", gain
            iterations = iterations + 1

It finds the best match, but it is way too slow and not very precise, as i'd like to find the igain with 0.01 step and ioffset in 0.5 step. For this resolution is this algorithm completely useless.

Is there any way how to solve this kind of optimisation in a pythonic way? (or is there a better approach how to find values of gain and offset to make the best match?)

Unfortunately I'm limited to numpy (no scipy), but any kind of hint is appreciated.

If the two signals are supposed to be the same shape, just y-shifted and y-scaled, you should find that

gain   = std_dev(measured) / std_dev(calibration)
offset = average(calibration - (measured / gain))

If you are happy with a solution of the form

measuredData = calibration data*gain + offset

finding a solution in simply a linear regression problem. This is probably best solved using the normal equation , which will give you a fit that minimises the sum of squares error, which is what I think you are after.

Concretely, in python I guess the solution could be found using, the numpy function pinv

from numpy.linalg import pinv
from numpy import transpose, dot

pinv( dot(dot(transpose(calibrationData),calibrationData),dot(transpose(calibrationData),measuredData) );

hope this helps. Sorry I didn't have time to double check whether the code works :)

With the help of user3235916 I have managed to write down following piece of code:

import numpy as np

measuredData = np.array(yvalues['int1'])
calibrationData = np.array(yvalues['int0'])

A = np.vstack( [measuredData, np.ones(len(measuredData))]).T
gain,offset = np.linalg.lstsq(A, calibrationData)[0]

Then I could use following transformation to get the measuredData recalculated to calibrationData:

map(lambda c: c*gain+offset, measuredData)

Fits perfectly (at least visually).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM