如何从 R 中的分位数（经验 CDF）估计密度（经验 pdf）

Question

题

假设我有一个未知的密度a 。

我所知道的是分位数（ quants ）的概率网格（ probs ）。

如何从未知密度生成随机样本？

这是我到目前为止。

我正在尝试拒绝抽样，但我不依赖于这种方法。 在这里，我将多项式（6 度）拟合到分位数。 这样做的目的是将离散分位数转换为平滑的连续函数。 这给了我一个经验 CDF。 然后我使用拒绝采样从 CDF 中获取实际样本。 R 中是否有一种方便的方法可以将样本从 CDF 转换为密度样本，或者当有更好的选择时我是否以一种复杂的方式进行了处理？

# unknown and probably not normal, but I use rnorm here because it is easy
a <- c(exp(rnorm(200, 5, .8)))
probs <- seq(0.05, 0.95, 0.05)
quants <- quantile(a, probs)
df_quants <- tibble::tibble(cum_probs, quants)
df_quants <- df_quants
fit <- lm(quants ~ poly(cum_probs, 6), df_quants)
df_quants$fit <- predict(fit, df_quants)

p <- df_quants %>%
  ggplot(aes(x = cum_probs, y = quants))+
  geom_line(aes(y = quants), color = "black", size = 1) +
  geom_line(aes(y = fit), color = "red", size = 1)

发展基金

count = 1
accept = c()
X <- runif(50000, 0, 1)
U <- runif(50000, 0, 1)
estimate <- function(x){
  new_x <- predict(fit, data.frame(cum_probs = c(x)))
  return(new_x)
while(count <= 50000 & length(accept) < 40000){
  test_u = U[count]
  test_x = estimate(X[count])/(1000*dunif(X[count], 0, 1))
  if(test_u <= test_x){
    accept = rbind(accept, X[count])
    count = count + 1
  }
    count = count + 1
}
p2 <- as_tibble(accept, name = V1) %>%
  ggplot(aes(x = V1)) +
  geom_histogram(bins = 45)
}

CDF 样本

Answer 1

我认为不需要拒绝采样，使用 Bspline 拟合我能够通过逆变换生成合理的样本，但我还需要更高分辨率的网格。 尾巴有点脱落。

我在这里做出的假设是，拟合到紧密分位数网格的 Bspline 近似于逆 CDF 函数。 一旦这条曲线完成，我就可以使用随机制服U[0,1]

library(splines2)

a <- c(exp(rnorm(200, 5, .8)))
cum_probs <- seq(0.01, 0.99, 0.01)
quants <- quantile(a, cum_probs)
df_quants <- tibble::tibble(cum_probs, quants)
fit_spline <- lm(quants ~ bSpline(cum_probs, df = 9), df_quants)
df_quants$fit_spline <- predict(fit_spline, df_quants)
estimate <- function(x){
  new_x <- predict(fit_spline, data.frame(cum_probs = c(x)))
  return(new_x)
}
e <- runif(10000, 0, 1)
y <-(estimate(e))
df_density <- tibble(y)
df_densitya <- tibble(a)
py <- df_density %>%
  ggplot(aes(x = y)) +
  geom_histogram()
pa <- df_densitya %>%
  ggplot(aes(x = a)) +
  geom_histogram(bins = 45)

原始密度

逆变换样本

汇总统计

原来DIST a

 Min. 1st Qu. Median Mean 3rd Qu. Max. 20.36 80.84 145.25 195.72 241.22 1285.24

从分位数y生成

Min. 1st Qu. Median Mean 3rd Qu. Max. 28.09 81.78 149.53 189.07 239.62 667.27

如何从 R 中的分位数（经验 CDF）估计密度（经验 pdf）

问题描述

题

这是我到目前为止。

发展基金

CDF 样本

1 个解决方案

解决方案1
0 2020-11-10 14:26:36

原始密度

逆变换样本

汇总统计

如何从 R 中的分位数（经验 CDF）估计密度（经验 pdf）

问题描述

题

这是我到目前为止。

发展基金

CDF 样本

1 个解决方案

解决方案1 0 2020-11-10 14:26:36

原始密度

逆变换样本

汇总统计

解决方案1
0 2020-11-10 14:26:36