简体   繁体   English

Python:使用 scipy curve_fit 将曲线拟合到屏蔽数据

[英]Python: fitting curve to masked data with scipy curve_fit

I'm trying to write a script with python/numpy/scipy for data manipulation, fitting and plotting of angle dependent magnetoresistance measurements.我正在尝试使用 python/numpy/scipy 编写一个脚本,用于数据操作、拟合和绘制角度相关磁阻测量。 I'm new to Python, got the frame code from my PhD advisor, and managed to add few hundred lines of code to the frame.我是 Python 新手,从我的博士导师那里得到了框架代码,并设法向框架添加了几百行代码。 After a while I noticed that some measurements had multiple blunders, and since the script should do all the manipulation automatically, I tried to mask those points and fit the curve to the unmasked points (the curve is a sine squared superposed on a linear function, so numpy.ma.polyfit isn't really a choice).一段时间后,我注意到一些测量有多个错误,并且由于脚本应该自动完成所有操作,因此我尝试屏蔽这些点并将曲线拟合到未屏蔽的点(曲线是叠加在线性函数上的正弦平方,所以 numpy.ma.polyfit 并不是一个真正的选择)。 However, after masking both x and y coordinates of the problematic points, the fitting would still take them into consideration, even though they wouldn't be shown in the plot.然而,在屏蔽了有问题的点的 x 和 y 坐标之后,拟合仍然会考虑它们,即使它们不会显示在图中。 The example is simplified, but the same is happening;这个例子被简化了,但同样的事情正在发生;

import numpy.ma as ma
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit



def Funk(x, k, y0):
 return k*x + y0   

fig,ax= plt.subplots()

x=ma.masked_array([1,2,3,4,5,6,7,8,9,10],mask=[0,0,0,0,0,0,1,1,1,1])
y=ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])


fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x, y)

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

The second half of the points is masked and not shown in the plot, but still taken into consideration.点的后半部分被屏蔽且未显示在图中,但仍被考虑在内。

While writing the post I figured out that I can do this:在写这篇文章时,我发现我可以做到这一点:

def Funk(x, k, y0):
    return k*x + y0   

fig,ax= plt.subplots()

x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([1,2,3,4,5,30,35,40,45,50])
mask=np.array([0,0,0,0,0,1,1,1,1,1])

fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[mask], y[mask])

ax.plot(x, Funk(x, fitParamsFunk[0], fitParamsFunk[1]))
ax.errorbar(x, y, yerr = None, ms=3, fmt='-o')
plt.show()

What I actually wanted我真正想要的

I guess that scipy curve_fit isn't meant to deal with masked arrays, but I still would like to know whether there is any workaround for this (I need to work with masked arrays because the number of data points is >10e6, but I'm only plotting 100 at once, so I would need to take the mask of the part of the array that I want to plot and assign it to another array, while copying the values of the array to another or setting the original mask to False)?我猜 scipy curve_fit 并不是为了处理掩码数组,但我仍然想知道是否有任何解决方法(我需要使用掩码数组,因为数据点的数量 > 10e6,但我'我一次只绘制 100,所以我需要获取我想要绘制的数组部分的掩码并将其分配给另一个数组,同时将数组的值复制到另一个数组或将原始掩码设置为 False) ? Thanks for any suggestions感谢您的任何建议

If you only want to consider the valid entries, you can use the inverse of the mask as an index:如果只想考虑有效条目,可以使用掩码的倒数作为索引:

x = ma.masked_array([1,2,3,4,5,6,7,8,9,10], mask=[0,0,0,0,0,1,1,1,1,1])  # changed mask
y = ma.masked_array([1,2,3,4,5,30,35,40,45,50], mask=[0,0,0,0,0,1,1,1,1,1])

fitParamsFunk, fitCovariancesFunk = curve_fit(Funk, x[~x.mask], y[~y.mask])

PS: Note that both arrays need to have the same amount of valid entries. PS:请注意,两个数组都需要具有相同数量的有效条目。

The use of mask in numerical calculus is equivalent to the use of the Heaviside step function in analytical calculus.在数值微积分中使用掩码相当于在解析微积分中使用 Heaviside 阶跃函数。 For example this becomes very simple by application for piecewise linear regression:例如,通过应用分段线性回归,这变得非常简单:

在此处输入图片说明

They are several examples of piecewise linear regression in the paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf他们是论文中分段线性回归的几个例子: https : //fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf

Using the method shown in this paper, the very simple calculus below leads to the expected form of result :使用本文所示的方法,下面非常简单的演算会导致结果的预期形式:

在此处输入图片说明

Note : In case of large number of points, if there was several points with slightly different abscissae in the transition area it sould be more accurate to apply the case considered pages 29-31 of the paper referenced above.注意:在大量点的情况下,如果过渡区域中有几个点的横坐标略有不同,则应用上述参考论文第 29-31 页的案例会更准确。

I think that what you want to do is to define a mask that lists the indices of the "good data points" and then use that as the points to fit (and/or to plot).我认为您想要做的是定义一个列出“良好数据点”索引的掩码,然后将其用作要拟合(和/或绘制)的点。

As a lead author of lmfit, I would recommend using that library for curve-fitting: it has many useful features over curve_fit .作为 lmfit 的主要作者,我建议使用该库进行曲线拟合:它比curve_fit具有许多有用的功能。 With this, your example might look like this:有了这个,您的示例可能如下所示:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model

def Funk(x, k, y0, good_points=None):  # note: add keyword argument
    f = k*x + y0
    if good_points is not None:
        f = f[good_points]       # apply mask of good data points
    return f

x = np.array([1,2,3,4,5, 6,7,8.,9,10])
y = np.array([1,2,3,4,5,30,35.,40,45,50]) 
y += np.random.normal(size=len(x), scale=0.19) # add some noise to make it fun

# make an array of the indices of the "good data points"
# does not need to be contiguous.
good_points=np.array([0,1,2,3,4])

# turn your model function Funk into an lmfit Model
mymodel = Model(Funk)

# create parameters, giving initial values. Note that parameters are
# named using the names of your function's argument and that keyword 
# arguments with non-numeric defaults like 'good points' are seen to
#  *not* be parameters. Like the independent variable `x`, you'll 
# need to pass that in when you do the fit.
# also: parameters can be fixed, or given `min` and `max` attributes

params = mymodel.make_params(k=1.4,  y0=0.2)
params['k'].min = 0

# do the fit to the 'good data', passing in the parameters, the 
# independent variable `x` and the `good_points` mask.
result  = mymodel.fit(y[good_points], params, x=x, good_points=good_points)

# print out a report of best fit values, uncertainties, correlations, etc.
print(result.fit_report())

# plot the results, again using the good_points array as needed.
plt.plot(x, y, 'o', label='all data')
plt.plot(x[good_points], result.best_fit[good_points], label='fit to good data')
plt.legend()
plt.show()

This will print out这将打印出来

[[Model]]
    Model(Funk)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 7
    # data points      = 5
    # variables        = 2
    chi-square         = 0.02302999
    reduced chi-square = 0.00767666
    Akaike info crit   = -22.9019787
    Bayesian info crit = -23.6831029
[[Variables]]
    k:   1.02460577 +/- 0.02770680 (2.70%) (init = 1.4)
    y0: -0.04135096 +/- 0.09189305 (222.23%) (init = 0.2)
[[Correlations]] (unreported correlations are < 0.100)
    C(k, y0) = -0.905

and produce a plot of并产生一个情节在此处输入图片说明

hope that helps get you started.希望能帮助您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM