使用 R 中的 purrr 包有條件地覆蓋列表？

Question

假設我有下面的數據集。 它包含每個政黨候選人在三個州的非中心性參數 (NCP)、自由度 (DF) 和模擬次數 (10,000)。 如您所見，某些種族沒有給定政黨的候選人：

dat <- tibble(state = c("Iowa", "Wisconsin", "Minnesota"), 
              ncp_D = c(0, 11000, 5700),
              ncp_R = c(10000, 12000, 5000), 
              ncp_Ind = c(1800, 0, 600),
              df_D = c(10),
              df_R = c(10),
              df_Ind = c(10),
              sims_D = c(10000),
              sims_R = c(10000),
              sims_Ind = c(10000))

我希望代碼使用purrr包為三個州的每個候選人生成 10,000 次模擬。 下面是我用來根據 t 分布（ rt() ）啟動此過程的代碼：

dat_results <- dat %>% 
  mutate(DVotes = pmap(list(sims_D, df_D, ncp_D), rt),
         RVotes = pmap(list(sims_R, df_R, ncp_R), rt),
         IndVotes = pmap(list(sims_Ind, df_Ind, ncp_Ind), rt))

這會在dat_results數據dat_results生成三個投票可能性列表，但我最終希望為候選人生成的列表在其 ncp 值為零時充滿零。 例如，愛荷華州的 D 候選項應該基於rt()函數的預測值為 10,000 個零，而不是使用 0 作為其 NCP 的值，從而產生一些負值。 與威斯康星州的 Ind 候選人相同。 本質上，我試圖有條件地覆蓋數據框中的列表。

在 R 中有沒有一種簡單的方法可以做到這一點，最好使用purrr包？ 提前致謝。

Answer 1

在您的情況下，我認為最簡單的方法是更改rt()函數：

cond_rt <- function(n, df, ncp, ...){
  if(ncp == 0) return(rep(0, n))
  rt(n, df, ncp, ...)
}

然后只需使用修改后的版本：

dat_results <- dat %>% 
  mutate(DVotes = pmap(list(sims_D, df_D, ncp_D), cond_rt),
         RVotes = pmap(list(sims_R, df_R, ncp_R), cond_rt),
         IndVotes = pmap(list(sims_Ind, df_Ind, ncp_Ind), cond_rt))

map_dbl(dat_results$DVotes, length)
#> [1] 10000 10000 10000
map_dbl(dat_results$DVotes, sum)
#> [1]         0 119262980  61756273

但是，如果您真的想有條件地修改后驗列，則可以使用mutate()和if_else()來完成。 我們遇到了一個問題，因為我們需要讀寫列表元素，這可以通過rowwise() （一次讀取單個行元素）並在輸出上調用list()來解決，這樣我們就可以得到一個可以作為元素插入的長度為 1 的列表。


dat_results2 <- dat %>% 
  mutate(DVotes = pmap(list(sims_D, df_D, ncp_D), rt),
         RVotes = pmap(list(sims_R, df_R, ncp_R), rt),
         IndVotes = pmap(list(sims_Ind, df_Ind, ncp_Ind), rt)) %>%
  rowwise() %>%
  mutate(DVotes = if_else(ncp_D == 0, list(rep(0, length(DVotes))), list(DVotes)),
         RVotes = if_else(ncp_R == 0, list(rep(0, length(RVotes))), list(RVotes)),
         IndVotes = if_else(ncp_Ind == 0, list(rep(0, length(IndVotes))), list(IndVotes)))

map_dbl(dat_results2$DVotes, length)
#> [1] 10000 10000 10000
map_dbl(dat_results2$DVotes, sum)
#> [1]         0 119172966  61629269

這可能可以通過across()來簡化。

使用 R 中的 purrr 包有條件地覆蓋列表？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-08-26 00:57:20

使用 R 中的 purrr 包有條件地覆蓋列表？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-08-26 00:57:20

解決方案1
1 已采納 2020-08-26 00:57:20