numpy - How to fit a double Gaussian distribution in Python? -

- July 15, 2011

i trying obtain double gaussian distribution data (link) using python. raw data of form:

for given data, obtain 2 gaussian profiles peaks seen in figure. tried following code (source):

from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy np pylab import * data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1] clf = mixture.gmm(n_components=2, covariance_type='full') clf.fit((y, x)) m1, m2 = clf.means_ w1, w2 = clf.weights_ c1, c2 = clf.covars_ fig = plt.figure(figsize = (5, 5)) plt.subplot(111) plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3) plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3) fig.savefig('gaussian_fit.pdf')

but not able desired output. so, how can double gaussian distribution obtained in python?

update

i able fit single gaussian distribution following code:

import pylab plb import matplotlib.pyplot plt scipy.optimize import curve_fit scipy import asarray ar,exp import numpy np  data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1] n = len(x) mean = sum(x*y)/n sigma = sum(y*(x-mean)**2)/n   def gaus(x,a,x0,sigma):     return a*exp(-(x-x0)**2/(2*sigma**2))   popt,pcov = curve_fit(gaus, x, y ,p0 = [1, mean, sigma])   fig = plt.figure(figsize = (5, 5)) plt.subplot(111) plt.plot(x, y, label='raw') plt.plot(x, gaus(x, *popt), 'o', markersize = 4, label='gaussian fit') plt.xlabel('x') plt.ylabel('y') plt.legend() fig.savefig('gaussian_fit.pdf')

you can't use scikit-learn this, because not dealing set of samples distribution want estimate. of course transform curve pdf, sample , try fit using gaussian mixture model, seems bit of overkill me.

here's solution using simple least square curve fitting. work had remove background, i.e. ignore data points y < 5, , provide starting vector leastsq, can estimated form plot of data.

finding starting vector

the parameter vector that found least squares method vector

params = [c1, mu1, sigma1, c2, mu2, sigma2]

here, c1 , c2 scaling factors 2 gaussians, i.e. height, mu1and mu2 means, i.e. horizontal positions of peaks , sigma1 , sigma2 standard deviations determine width of gaussians. find starting vector looked @ plot of data , estimated height of 2 peaks ( = c1, c2, respectively) , horizontal position (= mu1, mu1, respectively). sigma1 , sigma2 set 1.0.

code

from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy np pylab import * scipy.optimize import leastsq  data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1]  def double_gaussian( x, params ):     (c1, mu1, sigma1, c2, mu2, sigma2) = params     res =   c1 * np.exp( - (x - mu1)**2.0 / (2.0 * sigma1**2.0) ) \           + c2 * np.exp( - (x - mu2)**2.0 / (2.0 * sigma2**2.0) )     return res  def double_gaussian_fit( params ):     fit = double_gaussian( x, params )     return (fit - y_proc)  # remove background. y_proc = np.copy(y) y_proc[y_proc < 5] = 0.0  # least squares fit. starting values found inspection. fit = leastsq( double_gaussian_fit, [13.0,-13.0,1.0,60.0,3.0,1.0] ) plot( x, y, c='b' ) plot( x, double_gaussian( x, fit[0] ), c='r' )

Search This Blog

WIKI