繁体   English   中英

通过按行应用函数并创建(可能)更大的列来转换 R 数据框

[英]Transforming R dataframe by applying function rowwise and create (possibly) larger columns

我正在尝试通过将每一行用作函数参数来转换数据框(tibble)并从中创建一个新列,该列可能大于参数数量。 考虑以下示例,其中我有一些样本观察:

library(dplyr)
library(stringi)

observations <- c("110", "11011", "1100010")

df <- tibble(obs = observations) %>%
    transmute(
        Failure = stri_count(obs, fixed = "0"),
        Success = stri_count(obs, fixed = "1")
    )

df 然后是:

# A tibble: 3 x 2
  Failure Success
    <int>  <int>
1       1      2
2       1      4
3       4      3

我想获取每一行并将其用于计算一堆值,并将每个结果向量保存在一个新列中。 例如我想做:

p_values = pgrid <- seq(from = 0, to = 1, length.out = 11)

df %>%
    rowwise() %>%
    transmute(
        p = p_values,
        likelihood = dbinom(Success,
            size = Failure + Success,
            prob = p_values
        )
    )

Error: Column `p` must be length 1 (the group size), not 11

并得到类似的东西:

# A tibble: 4 x 11
  p_values likelihood_1 likelihood_2 likelihood_3
     <float>  <float>     <float>      <float>
1       0      ...         ...           ...
2       0.1    ...         ...           ...
...     ...    ...         ...           ...
10      0.9    ...         ...           ...
11      1      ...         ...           ...     

使用 tidyverse 方法时,这种工作流程可能会有些尴尬,因为数据不是“整洁”的格式。

我会从另一个角度来看,从p_values向量开始:

likelihoods <- 
  tibble(p = p_values) %>%
  mutate(likelihood_1 = dbinom(df[1,]$Success,size = df[1,]$Failure + df[1,]$Success,prob = p),
         likelihood_2 = dbinom(df[2,]$Success,size = df[2,]$Failure + df[2,]$Success,prob = p),
         likelihood_3 = dbinom(df[3,]$Success,size = df[3,]$Failure + df[3,]$Success,prob = p))

问题是transmutemutate期望元素数与行数相同(或者如果它被分组,那么该组的行数)。 在这里,我们按行rowwise - 这基本上是对每一行进行分组,因此预期的n()为 1,而输出是 'p_values' 的length 一种选择是使用pivot_wider (如果需要)包装在一个listunnest和重塑为“wide”格式

library(dplyr)
library(tidyr)
library(stringr)
df %>%
    mutate(grp = str_c('likelihood_', row_number())) %>%
    rowwise() %>%
         transmute(grp, p = list(p_values),
         likelihood = list(dbinom(Success,
            size = Failure + Success,
          prob = p_values
      ))
    ) %>% 
    unnest(c(p, likelihood)) %>%
    pivot_wider(names_from = grp, values_from = likelihood)
# A tibble: 11 x 4
#       p likelihood_1 likelihood_2 likelihood_3
#   <dbl>        <dbl>        <dbl>        <dbl>
# 1   0          0          0            0      
# 2   0.1        0.027      0.00045      0.0230 
# 3   0.2        0.096      0.0064       0.115  
# 4   0.3        0.189      0.0284       0.227  
# 5   0.4        0.288      0.0768       0.290  
# 6   0.5        0.375      0.156        0.273  
# 7   0.6        0.432      0.259        0.194  
# 8   0.7        0.441      0.360        0.0972 
# 9   0.8        0.384      0.410        0.0287 
#10   0.9        0.243      0.328        0.00255
#11   1          0          0            0      

我实际上会为此切换到purrr 函数pmap()将逐行迭代。 我们使用..1..2分别表示第一个和第二个输入。 使用pmap_dfc()将按列(dfc = 数据框列)绑定结果。

library(purrr)
library(tibble)

df %>%
  pmap_dfc(~ dbinom(..2, size = ..1 + ..2, prob = p_values)) %>%
  set_names(paste0("likelihood_", seq_along(.))) %>%
  add_column(p_values = p_values, .before = 1)
# A tibble: 11 x 4
   p_values likelihood_1 likelihood_2 likelihood_3
      <dbl>        <dbl>        <dbl>        <dbl>
 1      0          0          0            0      
 2      0.1        0.027      0.00045      0.0230 
 3      0.2        0.096      0.0064       0.115  
 4      0.3        0.189      0.0284       0.227  
 5      0.4        0.288      0.0768       0.290  
 6      0.5        0.375      0.156        0.273  
 7      0.6        0.432      0.259        0.194  
 8      0.7        0.441      0.360        0.0972 
 9      0.8        0.384      0.410        0.0287 
10      0.9        0.243      0.328        0.00255
11      1          0          0            0 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM