查看功能引导用于引导估计的值

Question

我已经写了下面的代码来获得均值的引导估计。 我的目标是通过功能查看从数据集中选择的号码，最好是在他们的顺序进行选择， boot在boot包。

数据集仅包含三个数字：1、10和100，而我仅使用两个引导程序样本。

估计的平均值为23.5 ，下面的R代码指示六个数字包括一个'1'，四个'10'和一个'100'。 但是，这些数字有30种可能的组合，平均得出23.5。

我是否有办法确定这30种可能的组合中的哪一种是实际出现在两个引导程序样本中的组合？

library(boot)

set.seed(1234)

dat <- c(1, 10, 100)
av  <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 2)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 2)
#
#
# Bootstrap Statistics :
#     original  bias    std. error
# t1*       37   -13.5    19.09188
#

mean(dat) + -13.5 
# [1] 23.5

# The two samples must have contained one '1', four '10' and one '100',
# but there are 30 possibilities.
# Which of these 30 possible sequences actual occurred?

# This code shows there must have been one '1', four '10' and one '100'
# and shows the 30 possible combinations

my.combos <- expand.grid(V1  = c(1, 10, 100),
                         V2  = c(1, 10, 100),
                         V3  = c(1, 10, 100),
                         V4  = c(1, 10, 100),
                         V5  = c(1, 10, 100),
                         V6  = c(1, 10, 100))

my.means <- apply(my.combos, 1, function(x) {( (x[1] + x[2] + x[3])/3 + (x[4] + x[5] + x[6])/3 ) / 2 })

possible.samples <- my.combos[my.means == 23.5,]
dim(possible.samples)

n.1   <- rowSums(possible.samples == 1)
n.10  <- rowSums(possible.samples == 10)
n.100 <- rowSums(possible.samples == 100)

n.1[1]
n.10[1]
n.100[1]

length(unique(n.1))   == 1
length(unique(n.10))  == 1
length(unique(n.100)) == 1

Answer 1

我认为您可以使用以下代码确定采样的数量和采样的顺序。 您必须从boot包中提取函数ordinary.array并将其粘贴到R代码中。 然后指定n ， R和strata的值，其中n是数据集中的观察数， R是所需的重复样本数。

我不知道这种方法的通用性，但是它与我尝试过的几个简单示例（包括以下示例）一起使用。

library(boot)

set.seed(1234)

dat <- c(1, 10, 100, 1000)
av  <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 3)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 3)
#
#
# Bootstrap Statistics :
#     original  bias    std. error
# t1*   277.75  -127.5    132.2405
# 
# 

mean(dat) + -127.5
# [1] 150.25

# boot:::ordinary.array

ordinary.array <- function (n, R, strata) 
{
    inds <- as.integer(names(table(strata)))
    if (length(inds) == 1L) {
        output <- sample.int(n, n * R, replace = TRUE)
        dim(output) <- c(R, n)
    }
    else {
        output <- matrix(as.integer(0L), R, n)
        for (is in inds) {
            gp <- seq_len(n)[strata == is]
            output[, gp] <- if (length(gp) == 1) 
                rep(gp, R)
            else bsample(gp, R * length(gp))
        }
    }
    output
}

# I think the function ordinary.array determines which elements 
# of the data are sampled in each of the R samples

set.seed(1234)
ordinary.array(n=4,R=3,1)

#      [,1] [,2] [,3] [,4]
# [1,]    1    3    1    3
# [2,]    3    4    1    3
# [3,]    3    3    3    3
#
# which equals:

((1+100+1+100) / 4  +  (100+1000+1+100) / 4  +  (100+100+100+100) / 4) / 3

# [1] 150.25

查看功能引导用于引导估计的值

问题描述

1 个解决方案

解决方案1
0 2015-04-13 14:54:07

查看功能引导用于引导估计的值

问题描述

1 个解决方案

解决方案1 0 2015-04-13 14:54:07

解决方案1
0 2015-04-13 14:54:07