I would like to fit multiple Gaussian curves to Mass spectrometry data in Python. Right now I'm fitting the data one Gaussian at a time -- literally one range at a time.
Is there a more streamlined way to do this? Is there a way I can run the data through a loop to plot a Gaussian at each peak? I'm guessing there's gotta be a better way, but I've combed through the internet.
My graph for two Gaussians is shown below.
My example data can be found at: http://txt.do/dooxv
And here's my current code:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
from scipy.interpolate import interp1d
RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
RGAdata=RGAdata.transpose()
x=RGAdata[0]
y=RGAdata[1]
# graph labels
plt.ylabel('ion current')
plt.xlabel('mass/charge ratio')
plt.xticks(np.arange(min(RGAdata[0]), max(RGAdata[0])+2, 2.0))
plt.ylim([10**-12.5, 10**-9])
plt.title('RGA Data Jul 25, 2017')
plt.semilogy(x, y,'b')
#fitting a guassian to a peak
def gauss(x, a, mu, sig):
return a*np.exp(-(x-mu)**2/(2*sig**2))
fitx=x[(x>40)*(x<43)]
fity=y[(x>40)*(x<43)]
mu=np.sum(fitx*fity)/np.sum(fity)
sig=np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))
print (mu, sig, max(fity))
popt, pcov = opt.curve_fit(gauss, fitx, fity, p0=[max(fity),mu, sig])
plt.semilogy(x, gauss(x, popt[0],popt[1],popt[2]), 'r-', label='fit')
#second guassian
fitx2=x[(x>26)*(x<31)]
fity2=y[(x>26)*(x<31)]
mu=np.sum(fitx2*fity2)/np.sum(fity2)
sig=np.sqrt(np.sum(fity2*(fitx2-mu)**2)/np.sum(fity2))
print (mu, sig, max(fity2))
popt2, pcov2 = opt.curve_fit(gauss, fitx2, fity2, p0=[max(fity2),mu, sig])
plt.semilogy(x, gauss(x, popt2[0],popt2[1],popt2[2]), 'm', label='fit2')
plt.show()
Here's some sample code of identifying peaks in a data set to get you started. You can find a link to all the examples here .
import numpy as np
import peakutils
cb = np.array([-0.010223, ... ])
indexes = peakutils.indexes(cb, thres=0.02/max(cb), min_dist=100)
# [ 333 693 1234 1600]
interpolatedIndexes = peakutils.interpolate(range(0, len(cb)), cb, ind=indexes)
# [ 332.61234263 694.94831376 1231.92840845 1600.52446335]
In addition to Alex F's answer, you need to identify peaks and analyze their surroundings to identify the xmin
and xmax
values.
If you have done that, you can use this slightly refactored code and the loop within to plot all relevant data
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
from scipy.interpolate import interp1d
def _gauss(x, a, mu, sig):
return a*np.exp(-(x-mu)**2/(2*sig**2))
def gauss(x, y, xmin, xmax):
fitx = x[(x>xmin)*(x<xmax)]
fity = y[(x>xmin)*(x<xmax)]
mu = np.sum(fitx*fity)/np.sum(fity)
sig = np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))
print (mu, sig, max(fity))
popt, pcov = opt.curve_fit(_gauss, fitx, fity, p0=[max(fity), mu, sig])
return _gauss(x, popt[0], popt[1], popt[2])
# Load data and define x - y
RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
x, y = RGAdata.T
# Create the plot
fig, ax = plt.subplots()
ax.semilogy(x, y, 'b')
# Plot the Gaussian's between xmin and xmax
for xmin, xmax in [(40, 43), (26, 31)]:
yG = gauss(x, y, xmin, xmax)
ax.semilogy(x, yG)
# Prettify the graph
ax.set_xlabel("mass/charge ratio")
ax.set_ylabel("ion current")
ax.set_xticks(np.arange(min(x), max(x)+2, 2.0))
ax.set_ylim([10**-12.5, 10**-9])
ax.set_title("RGA Data Jul 25, 2017")
plt.show()
You may find the lmfit module ( https://lmfit.github.io/lmfit-py/ ) helpful. This provides a pre-built GaussianModel class for fitting a peak to a single Gaussian and supports adding multiple Models (not necessarily Gaussians, but also other peak models and other functions that might be useful for backgrounds and so for) into a composite model that can be fit at once.
Lmfit supports fixing or giving a range to some Parameters, so that you could build a model as a sum of Gaussians with fixed positions, limiting the value for the centroid to vary with some range (so the peaks cannot get confused). In addition, you can impose simple mathematical constraints on parameter values, so that you might require that all peak widths are the same size (or related in some simple form).
In particular, you might look to https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes for an example a fit using 2 Gaussians and a background function.
For peak finding, I've found scipy.signal.find_peaks_cwt
to be pretty good.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.