如何從適合 R 中觀察值的隨機分布中生成和采樣？

Question

我一直在使用來自 package amt的現有 function random_steps 。 我需要從我的數據中生成適合觀察值的值的伽馬分布，然后從這些值中采樣，以便為我提供觀察到的可能的替代方案。 問題是觀測值的范圍是 0 到 53，但生成的值是 0 到 522。顯然 function 給我的值不僅不可信而且不可能。

I do not want to go through the source code ( https://github.com/jmsigner/amt/blob/master/R/random_steps.R ) looking for something to fix, so I am hoping that someone can give me an escape這樣我就可以把amt拋在后面。 但是，我無法在其他地方找到解決方案。 當然有一種簡單的方法可以使用現有值生成分布，然后從中采樣？

這些是在數據（左）中觀察到的值的密度圖，由random_steps （右）生成。

我的數據集有超過 200 萬行，這對我來說是一個問題，無法准確地展示正在發生的事情。 基本上代碼如下：

stepTime <- 60
toleranceTime <- 15

tracks <- lapply(split(df, df$name), function(x){

# make animal tracks and resample to consistent times
  trk <- mk_track(df, .x=long, .y=lat, .t=timestamp, id = name) %>%
    track_resample(rate = minutes(stepTime),tolerance = minutes(toleranceTime))
  
  # burst steps
  burst <- steps_by_burst(trk, keep_cols = "start")
  
  # create random steps using fitted gamma and von mises distributions and append
  rnd_stps <- burst %>%  random_steps(n_control = stepNumber)
  
}) %>% reduce(rbind)

它產生的替代值有 4.03% 大於最高觀測值。

當我按照下面的建議生成一個全新的分布時，我得到了一條非常漂亮的曲線，它完全位於可能值的范圍內。 但是，當我從中提取值以創建我的可能值樣本以與觀察到的值進行比較時，下端的密度非常高，以至於所有采樣值都低於 1。我的觀察值范圍從 0 到 53而我的選擇范圍從 0 到 1。

關於如何獲得更緊密地反映現實的分布的任何提示？

非常感謝！

Answer 1

我不熟悉random_steps但將伽馬分布擬合到一組觀察值並不難。 這是一種方法。

如果根據尺度和形狀參數對伽馬分布進行參數化，則均值由 shape * scale 給出，方差由 shape * scale * scale 給出。 因此，計算樣本的均值和方差，並將尺度導出為方差/均值，因此形狀 = 均值/尺度。 有關詳細信息，請參見此處。

這是一個有效的例子。

library(tidyverse)

# Generate some gamma data
set.seed(123)
d <- tibble(x = rgamma(1000, shape=3, scale=1.5)) 

# Calculate summary statistics
stats <- d %>% summarise(Mean=mean(x), Variance=var(x))
stats
# A tibble: 1 x 2
   Mean Variance
  <dbl>    <dbl>
1  4.32     5.65

# Derive parameter estimates
scale <- stats$Variance / stats$Mean
shape <-  stats$Mean / scale

> scale
[1] 1.307148
> shape
[1] 3.305911

# Derive fitted PDF and compare with empirical PDF
fitted <- tibble(x=seq(0,15,0.25), y=dgamma(x, shape=shape, scale=scale))
d %>% ggplot + geom_histogram(aes(x=x, y=..density..), bins=20) + geom_line(data=fitted, aes(x=x, y=y), colour="blue")

如果您觀察到的數據與擬合分布之間仍然存在顯着差異，則表明您的觀察結果不遵循伽馬分布。

如何從適合 R 中觀察值的隨機分布中生成和采樣？

問題描述

1 個解決方案

解決方案1
0 2021-03-08 13:50:31

如何從適合 R 中觀察值的隨機分布中生成和采樣？

問題描述

1 個解決方案

解決方案1 0 2021-03-08 13:50:31

解決方案1
0 2021-03-08 13:50:31