简体   繁体   中英

How can I fit multiple Gaussian curved to mass spectrometry data in Python?

I would like to fit multiple Gaussian curves to Mass spectrometry data in Python. Right now I'm fitting the data one Gaussian at a time -- literally one range at a time.

Is there a more streamlined way to do this? Is there a way I can run the data through a loop to plot a Gaussian at each peak? I'm guessing there's gotta be a better way, but I've combed through the internet.

My graph for two Gaussians is shown below.

具有两个高斯拟合的质谱 py.plot

My example data can be found at: http://txt.do/dooxv

And here's my current code:

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt

from scipy.interpolate import interp1d

RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
RGAdata=RGAdata.transpose()

x=RGAdata[0]
y=RGAdata[1]

# graph labels
plt.ylabel('ion current')
plt.xlabel('mass/charge ratio')
plt.xticks(np.arange(min(RGAdata[0]), max(RGAdata[0])+2, 2.0))
plt.ylim([10**-12.5, 10**-9])
plt.title('RGA Data Jul 25, 2017')

plt.semilogy(x, y,'b')

#fitting a guassian to a peak

def gauss(x, a, mu, sig):
return a*np.exp(-(x-mu)**2/(2*sig**2))


fitx=x[(x>40)*(x<43)]
fity=y[(x>40)*(x<43)]
mu=np.sum(fitx*fity)/np.sum(fity)
sig=np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))

print (mu, sig, max(fity))

popt, pcov = opt.curve_fit(gauss, fitx, fity, p0=[max(fity),mu, sig])
plt.semilogy(x, gauss(x, popt[0],popt[1],popt[2]), 'r-', label='fit')

#second guassian

fitx2=x[(x>26)*(x<31)]
fity2=y[(x>26)*(x<31)]
mu=np.sum(fitx2*fity2)/np.sum(fity2)
sig=np.sqrt(np.sum(fity2*(fitx2-mu)**2)/np.sum(fity2))

print (mu, sig, max(fity2))

popt2, pcov2 = opt.curve_fit(gauss, fitx2, fity2, p0=[max(fity2),mu, sig])
plt.semilogy(x, gauss(x, popt2[0],popt2[1],popt2[2]), 'm', label='fit2')

plt.show()

Here's some sample code of identifying peaks in a data set to get you started. You can find a link to all the examples here .

import numpy as np
import peakutils
cb = np.array([-0.010223, ... ])
indexes = peakutils.indexes(cb, thres=0.02/max(cb), min_dist=100)
# [ 333  693 1234 1600]

interpolatedIndexes = peakutils.interpolate(range(0, len(cb)), cb, ind=indexes)
# [  332.61234263   694.94831376  1231.92840845  1600.52446335]

In addition to Alex F's answer, you need to identify peaks and analyze their surroundings to identify the xmin and xmax values.

If you have done that, you can use this slightly refactored code and the loop within to plot all relevant data

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt

from scipy.interpolate import interp1d

def _gauss(x, a, mu, sig):
    return a*np.exp(-(x-mu)**2/(2*sig**2))

def gauss(x, y, xmin, xmax):
    fitx = x[(x>xmin)*(x<xmax)]
    fity = y[(x>xmin)*(x<xmax)]
    mu = np.sum(fitx*fity)/np.sum(fity)
    sig = np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))

    print (mu, sig, max(fity))

    popt, pcov = opt.curve_fit(_gauss, fitx, fity, p0=[max(fity), mu, sig])
    return _gauss(x, popt[0], popt[1], popt[2])

# Load data and define x - y
RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
x, y = RGAdata.T

# Create the plot
fig, ax = plt.subplots()
ax.semilogy(x, y, 'b')

# Plot the Gaussian's between xmin and xmax
for xmin, xmax in [(40, 43), (26, 31)]:
    yG = gauss(x, y, xmin, xmax)
    ax.semilogy(x, yG)

# Prettify the graph
ax.set_xlabel("mass/charge ratio")
ax.set_ylabel("ion current")
ax.set_xticks(np.arange(min(x), max(x)+2, 2.0))
ax.set_ylim([10**-12.5, 10**-9])
ax.set_title("RGA Data Jul 25, 2017")

plt.show()

You may find the lmfit module ( https://lmfit.github.io/lmfit-py/ ) helpful. This provides a pre-built GaussianModel class for fitting a peak to a single Gaussian and supports adding multiple Models (not necessarily Gaussians, but also other peak models and other functions that might be useful for backgrounds and so for) into a composite model that can be fit at once.

Lmfit supports fixing or giving a range to some Parameters, so that you could build a model as a sum of Gaussians with fixed positions, limiting the value for the centroid to vary with some range (so the peaks cannot get confused). In addition, you can impose simple mathematical constraints on parameter values, so that you might require that all peak widths are the same size (or related in some simple form).

In particular, you might look to https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes for an example a fit using 2 Gaussians and a background function.

For peak finding, I've found scipy.signal.find_peaks_cwt to be pretty good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM