简体   繁体   English

通过按行应用函数并创建(可能)更大的列来转换 R 数据框

[英]Transforming R dataframe by applying function rowwise and create (possibly) larger columns

I'm trying to transform a dataframe (tibble) by using each row as function arguments and create a new column out of it, which is possibly bigger than the number of arguments.我正在尝试通过将每一行用作函数参数来转换数据框(tibble)并从中创建一个新列,该列可能大于参数数量。 Consider the following example, where I have some sample observations:考虑以下示例,其中我有一些样本观察:

library(dplyr)
library(stringi)

observations <- c("110", "11011", "1100010")

df <- tibble(obs = observations) %>%
    transmute(
        Failure = stri_count(obs, fixed = "0"),
        Success = stri_count(obs, fixed = "1")
    )

df is then: df 然后是:

# A tibble: 3 x 2
  Failure Success
    <int>  <int>
1       1      2
2       1      4
3       4      3

I would like to take every row and use that for calculating a bunch of values, and save each result vector in a new column.我想获取每一行并将其用于计算一堆值,并将每个结果向量保存在一个新列中。 For example I would like to do:例如我想做:

p_values = pgrid <- seq(from = 0, to = 1, length.out = 11)

df %>%
    rowwise() %>%
    transmute(
        p = p_values,
        likelihood = dbinom(Success,
            size = Failure + Success,
            prob = p_values
        )
    )

Error: Column `p` must be length 1 (the group size), not 11

And get something like:并得到类似的东西:

# A tibble: 4 x 11
  p_values likelihood_1 likelihood_2 likelihood_3
     <float>  <float>     <float>      <float>
1       0      ...         ...           ...
2       0.1    ...         ...           ...
...     ...    ...         ...           ...
10      0.9    ...         ...           ...
11      1      ...         ...           ...     

This sort of workflow can be somewhat awkward with a tidyverse approach, as the data is not in a 'tidy' format.使用 tidyverse 方法时,这种工作流程可能会有些尴尬,因为数据不是“整洁”的格式。

I would come at it from the other angle, starting with the p_values vector:我会从另一个角度来看,从p_values向量开始:

likelihoods <- 
  tibble(p = p_values) %>%
  mutate(likelihood_1 = dbinom(df[1,]$Success,size = df[1,]$Failure + df[1,]$Success,prob = p),
         likelihood_2 = dbinom(df[2,]$Success,size = df[2,]$Failure + df[2,]$Success,prob = p),
         likelihood_3 = dbinom(df[3,]$Success,size = df[3,]$Failure + df[3,]$Success,prob = p))

The issue is that transmute or mutate expects the number of elements to be same as number of rows (or if it is grouped, then the number of rows for that group).问题是transmutemutate期望元素数与行数相同(或者如果它被分组,那么该组的行数)。 Here, we do rowwise - which is basically grouping each row, so the n() expected is 1, whereas the output is length of 'p_values'.在这里,我们按行rowwise - 这基本上是对每一行进行分组,因此预期的n()为 1,而输出是 'p_values' 的length One option is to wrap in a list , unnest , and reshape to 'wide' format with pivot_wider (if needed)一种选择是使用pivot_wider (如果需要)包装在一个listunnest和重塑为“wide”格式

library(dplyr)
library(tidyr)
library(stringr)
df %>%
    mutate(grp = str_c('likelihood_', row_number())) %>%
    rowwise() %>%
         transmute(grp, p = list(p_values),
         likelihood = list(dbinom(Success,
            size = Failure + Success,
          prob = p_values
      ))
    ) %>% 
    unnest(c(p, likelihood)) %>%
    pivot_wider(names_from = grp, values_from = likelihood)
# A tibble: 11 x 4
#       p likelihood_1 likelihood_2 likelihood_3
#   <dbl>        <dbl>        <dbl>        <dbl>
# 1   0          0          0            0      
# 2   0.1        0.027      0.00045      0.0230 
# 3   0.2        0.096      0.0064       0.115  
# 4   0.3        0.189      0.0284       0.227  
# 5   0.4        0.288      0.0768       0.290  
# 6   0.5        0.375      0.156        0.273  
# 7   0.6        0.432      0.259        0.194  
# 8   0.7        0.441      0.360        0.0972 
# 9   0.8        0.384      0.410        0.0287 
#10   0.9        0.243      0.328        0.00255
#11   1          0          0            0      

I would actually switch into purrr for this.我实际上会为此切换到purrr The function pmap() will iterate by row.函数pmap()将逐行迭代。 We use ..1 and ..2 to signify the first and second inputs, respectively.我们使用..1..2分别表示第一个和第二个输入。 Using pmap_dfc() will bind the results by columns (dfc = data frame columns).使用pmap_dfc()将按列(dfc = 数据框列)绑定结果。

library(purrr)
library(tibble)

df %>%
  pmap_dfc(~ dbinom(..2, size = ..1 + ..2, prob = p_values)) %>%
  set_names(paste0("likelihood_", seq_along(.))) %>%
  add_column(p_values = p_values, .before = 1)
# A tibble: 11 x 4
   p_values likelihood_1 likelihood_2 likelihood_3
      <dbl>        <dbl>        <dbl>        <dbl>
 1      0          0          0            0      
 2      0.1        0.027      0.00045      0.0230 
 3      0.2        0.096      0.0064       0.115  
 4      0.3        0.189      0.0284       0.227  
 5      0.4        0.288      0.0768       0.290  
 6      0.5        0.375      0.156        0.273  
 7      0.6        0.432      0.259        0.194  
 8      0.7        0.441      0.360        0.0972 
 9      0.8        0.384      0.410        0.0287 
10      0.9        0.243      0.328        0.00255
11      1          0          0            0 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM