简体   繁体   中英

how to use pandas to create correlation matrix of multivariate normal distribution?

In R, we could create the correlation matrix like this:

makecov <- function(rho,n) {
    m <- matrix(nrow=n,ncol=n)
    m <- ifelse(row(m)==col(m),1,rho)
    return(m)
}

As we know the correlation,the result would be:

makecov(0.2,3)
#     [,1] [,2] [,3]
#[1,]  1.0  0.2  0.2
#[2,]  0.2  1.0  0.2
#[3,]  0.2  0.2  1.0

But in pandas,how could we create the same matrix efficiently? Here is my solution:

def makecov(rho,n):
    m=[rho/2]*n*n
    m=np.array(m).reshape([n,n])
    return m+m.T-np.diag([rho]*n)+np.diag([1]*n)

And the result would be:

In [21]:makecov(0.2,3)
Out[21]: 
array([[ 1. ,  0.2,  0.2],
       [ 0.2,  1. ,  0.2],
       [ 0.2,  0.2,  1. ]])

Is there some more elegant ways to do that with pandas?

I would recommend using numpy's covariance matrix method instead: http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html

Pandas in my experience is better used for data cleaning and whatnot. I usually let numpy do the heavy statistical lifting.

It looks like you could do

def makecov(rho, n):
    out = numpy.eye(n) + rho
    numpy.fill_diagonal(out, 1)
    return out

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM