简体   繁体   English

R中多元t分布的估计

[英]Estimation of multivariate t distribution in R

I would like to konw if there is any function in R that allows to estimate the df of a multivariate t distribution. 我想知道R中是否有任何函数可以估算多元t分布的df。

The problem is easy: I have a matrix of 5 variables (columns) with 75 observations (rows). 问题很容易:我有一个包含5个变量(列)和75个观察值(行)的矩阵。 I would like to estimate the df of a multivariate t on that sample. 我想估算该样本的多元t的df。

Thanks, 谢谢,

Juan. 胡安。

*** Edition: after fabians suggestions I implemented the dmvt() formula * ** * *** 版本:在fabians建议之后,我实现了dmvt()公式 * ** *

# "residuals" is a matrix with residuals from a model. I want to estimate the df of  
# that sample assuming multivariate-t

sigma<-cor(residuals, use="pairwise.complete.obs", method="pearson")
my_means<-vector(length = 8)

for (i in 1:8){
  my_means[i]<-mean(my_matrix[,i]) 
}

residuals.scaled<-scale(residuals)
df.1 <-dmvt(residuals.scaled, my_means, sigma, log= FALSE, type = "shifted", df = 1)

I have some doubts regarding: 1) Scaling: I'm also centering the data. 我对此有一些疑问:1)缩放:我也将数据居中。 Don't know if this is correct. 不知道这是否正确。 2) Using log = FALSE as I don't know why densities should be given as log(d) in my case 3) From here I should estimate the likehood of the sample data for each df. 2)使用log = FALSE,因为我不知道为什么在我的情况下应将密度指定为log(d)3)从这里,我应该估计每个df的样本数据的似然性。 Thus, more code lines like df.2, df.3, etc should be added and then calculate the likelihood of each. 因此,应添加更多代码行,例如df.2,df.3等,然后计算每个代码行的可能性。 Then, choose the highest. 然后,选择最高的。 Is that correct? 那是对的吗?

Package mvtnorm supplies the density of a (shifted) multivariate t-distribution in function dmvt . mvtnorm供给(移位)在多元函数t分布的密度dmvt You could enter your (scaled) data and its sample correlation and compute the likelihood of your data for different values of df . 您可以输入(缩放的)数据及其样本相关性,并针对不同的df值计算数据的可能性。 Pick the value of df that maximizes the likelihood of your data. 选择使数据可能性最大化的df值。

EDIT: 编辑:

library(mvtnorm)
set.seed(12121212)
################################################################################
## simulate n vectors of p-dim. t-distributed data in matrix X:
n <- 300
p <- 8

# draw random column means
means <- 10 * rnorm(p)

# correlation is AR(1) with correlation rho=.8
rho <- 0.8
sigma <- rho ^ abs(outer(1:p, 1:p, "-"))

# column s.d.s are sqrt(1:8)
df <- 3
X <- t(t(rmvt(n, sigma=sigma, delta=means, df=df)) * sqrt(1:8))


################################################################################
# evaluate t-likelihood for scaled X:

X_scale <- scale(X)
sigma_est <- cor(X_scale)

df_candidates <- seq(1, 20, by=2)
loglik <- numeric(length(df_candidates))
names(loglik) <- df_candidates
for(df in df_candidates){
    # no need for delta since we're working on scaled & centered data.
    # use sum(log(likelihood)), not prod(likelihood) to avoid numeric over/underflow 
    loglik[as.character(df)] <- sum(dmvt(x=X_scale, sigma=sigma_est, 
                                         df=df, log=TRUE))
}
loglik
#        1         3         5         7         9        11        13 
#-1788.219 -1756.301 -1768.885 -1783.724 -1797.386 -1809.556 -1820.382 
#       15        17        19 
#-1830.066 -1838.788 -1846.698 
## --> maximal for df=3, as used for the simulation.

## verify that mean shift can be incorporated into pre-processing as above:
dmvt(X[1,], delta=means) == dmvt(X[1,] - means)
#[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM