In R, we could create the correlation matrix like this:
makecov <- function(rho,n) {
m <- matrix(nrow=n,ncol=n)
m <- ifelse(row(m)==col(m),1,rho)
return(m)
}
As we know the correlation,the result would be:
makecov(0.2,3)
# [,1] [,2] [,3]
#[1,] 1.0 0.2 0.2
#[2,] 0.2 1.0 0.2
#[3,] 0.2 0.2 1.0
But in pandas,how could we create the same matrix efficiently? Here is my solution:
def makecov(rho,n):
m=[rho/2]*n*n
m=np.array(m).reshape([n,n])
return m+m.T-np.diag([rho]*n)+np.diag([1]*n)
And the result would be:
In [21]:makecov(0.2,3)
Out[21]:
array([[ 1. , 0.2, 0.2],
[ 0.2, 1. , 0.2],
[ 0.2, 0.2, 1. ]])
Is there some more elegant ways to do that with pandas?
I would recommend using numpy's covariance matrix method instead: http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html
Pandas in my experience is better used for data cleaning and whatnot. I usually let numpy do the heavy statistical lifting.
It looks like you could do
def makecov(rho, n):
out = numpy.eye(n) + rho
numpy.fill_diagonal(out, 1)
return out
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.