numpy - How to fit a double Gaussian distribution in Python? -
i trying obtain double gaussian distribution data (link) using python. raw data of form:
for given data, obtain 2 gaussian profiles peaks seen in figure. tried following code (source):
from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy np pylab import * data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1] clf = mixture.gmm(n_components=2, covariance_type='full') clf.fit((y, x)) m1, m2 = clf.means_ w1, w2 = clf.weights_ c1, c2 = clf.covars_ fig = plt.figure(figsize = (5, 5)) plt.subplot(111) plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3) plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3) fig.savefig('gaussian_fit.pdf')
but not able desired output. so, how can double gaussian distribution obtained in python?
update
i able fit single gaussian distribution following code:
import pylab plb import matplotlib.pyplot plt scipy.optimize import curve_fit scipy import asarray ar,exp import numpy np data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1] n = len(x) mean = sum(x*y)/n sigma = sum(y*(x-mean)**2)/n def gaus(x,a,x0,sigma): return a*exp(-(x-x0)**2/(2*sigma**2)) popt,pcov = curve_fit(gaus, x, y ,p0 = [1, mean, sigma]) fig = plt.figure(figsize = (5, 5)) plt.subplot(111) plt.plot(x, y, label='raw') plt.plot(x, gaus(x, *popt), 'o', markersize = 4, label='gaussian fit') plt.xlabel('x') plt.ylabel('y') plt.legend() fig.savefig('gaussian_fit.pdf')
you can't use scikit-learn this, because not dealing set of samples distribution want estimate. of course transform curve pdf, sample , try fit using gaussian mixture model, seems bit of overkill me.
here's solution using simple least square curve fitting. work had remove background, i.e. ignore data points y < 5
, , provide starting vector leastsq
, can estimated form plot of data.
finding starting vector
the parameter vector that found least squares method vector
params = [c1, mu1, sigma1, c2, mu2, sigma2]
here, c1
, c2
scaling factors 2 gaussians, i.e. height, mu1
and mu2
means, i.e. horizontal positions of peaks , sigma1
, sigma2
standard deviations determine width of gaussians. find starting vector looked @ plot of data , estimated height of 2 peaks ( = c1
, c2
, respectively) , horizontal position (= mu1
, mu1
, respectively). sigma1
, sigma2
set 1.0
.
code
from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy np pylab import * scipy.optimize import leastsq data = np.genfromtxt('gaussian_fit.dat', skiprows = 1) x = data[:, 0] y = data[:, 1] def double_gaussian( x, params ): (c1, mu1, sigma1, c2, mu2, sigma2) = params res = c1 * np.exp( - (x - mu1)**2.0 / (2.0 * sigma1**2.0) ) \ + c2 * np.exp( - (x - mu2)**2.0 / (2.0 * sigma2**2.0) ) return res def double_gaussian_fit( params ): fit = double_gaussian( x, params ) return (fit - y_proc) # remove background. y_proc = np.copy(y) y_proc[y_proc < 5] = 0.0 # least squares fit. starting values found inspection. fit = leastsq( double_gaussian_fit, [13.0,-13.0,1.0,60.0,3.0,1.0] ) plot( x, y, c='b' ) plot( x, double_gaussian( x, fit[0] ), c='r' )
Comments
Post a Comment