简体   繁体   中英

how to create a qq plot between two samples of different size in python?

I got an original sample data and its simulated data (don't ask me how I simulated), and I want to check if histograms are matching. So the best way is by qqplot but statsmodels library does not allow samples with different size.

Constructing a qq plot involves finding corresponding quantiles in both sets and plotting them against one another. In the case where one set is larger than the other, common practice is to take the quantile levels of the smaller set, and use linear interpolation to estimate the corresponding quantiles in the larger set. This is described here: http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm

This is relatively straightforward to do manually:

import numpy as np
import pylab

test1 = np.random.normal(0, 1, 1000)
test2 = np.random.normal(0, 1, 800)

#Calculate quantiles
test1.sort()
quantile_levels1 = np.arange(len(test1),dtype=float)/len(test1)

test2.sort()
quantile_levels2 = np.arange(len(test2),dtype=float)/len(test2)

#Use the smaller set of quantile levels to create the plot
quantile_levels = quantile_levels2

#We already have the set of quantiles for the smaller data set
quantiles2 = test2

#We find the set of quantiles for the larger data set using linear interpolation
quantiles1 = np.interp(quantile_levels,quantile_levels1,test1)

#Plot the quantiles to create the qq plot
pylab.plot(quantiles1,quantiles2)

#Add a reference line
maxval = max(test1[-1],test2[-1])
minval = min(test1[0],test2[0])
pylab.plot([minval,maxval],[minval,maxval],'k-')

pylab.show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM