[英]Transforming R dataframe by applying function rowwise and create (possibly) larger columns
I'm trying to transform a dataframe (tibble) by using each row as function arguments and create a new column out of it, which is possibly bigger than the number of arguments.我正在尝试通过将每一行用作函数参数来转换数据框(tibble)并从中创建一个新列,该列可能大于参数数量。 Consider the following example, where I have some sample observations:
考虑以下示例,其中我有一些样本观察:
library(dplyr)
library(stringi)
observations <- c("110", "11011", "1100010")
df <- tibble(obs = observations) %>%
transmute(
Failure = stri_count(obs, fixed = "0"),
Success = stri_count(obs, fixed = "1")
)
df is then: df 然后是:
# A tibble: 3 x 2
Failure Success
<int> <int>
1 1 2
2 1 4
3 4 3
I would like to take every row and use that for calculating a bunch of values, and save each result vector in a new column.我想获取每一行并将其用于计算一堆值,并将每个结果向量保存在一个新列中。 For example I would like to do:
例如我想做:
p_values = pgrid <- seq(from = 0, to = 1, length.out = 11)
df %>%
rowwise() %>%
transmute(
p = p_values,
likelihood = dbinom(Success,
size = Failure + Success,
prob = p_values
)
)
Error: Column `p` must be length 1 (the group size), not 11
And get something like:并得到类似的东西:
# A tibble: 4 x 11
p_values likelihood_1 likelihood_2 likelihood_3
<float> <float> <float> <float>
1 0 ... ... ...
2 0.1 ... ... ...
... ... ... ... ...
10 0.9 ... ... ...
11 1 ... ... ...
This sort of workflow can be somewhat awkward with a tidyverse approach, as the data is not in a 'tidy' format.使用 tidyverse 方法时,这种工作流程可能会有些尴尬,因为数据不是“整洁”的格式。
I would come at it from the other angle, starting with the p_values
vector:我会从另一个角度来看,从
p_values
向量开始:
likelihoods <-
tibble(p = p_values) %>%
mutate(likelihood_1 = dbinom(df[1,]$Success,size = df[1,]$Failure + df[1,]$Success,prob = p),
likelihood_2 = dbinom(df[2,]$Success,size = df[2,]$Failure + df[2,]$Success,prob = p),
likelihood_3 = dbinom(df[3,]$Success,size = df[3,]$Failure + df[3,]$Success,prob = p))
The issue is that transmute
or mutate
expects the number of elements to be same as number of rows (or if it is grouped, then the number of rows for that group).问题是
transmute
或mutate
期望元素数与行数相同(或者如果它被分组,那么该组的行数)。 Here, we do rowwise
- which is basically grouping each row, so the n()
expected is 1, whereas the output is length
of 'p_values'.在这里,我们按行
rowwise
- 这基本上是对每一行进行分组,因此预期的n()
为 1,而输出是 'p_values' 的length
。 One option is to wrap in a list
, unnest
, and reshape to 'wide' format with pivot_wider
(if needed)一种选择是使用
pivot_wider
(如果需要)包装在一个list
, unnest
和重塑为“wide”格式
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(grp = str_c('likelihood_', row_number())) %>%
rowwise() %>%
transmute(grp, p = list(p_values),
likelihood = list(dbinom(Success,
size = Failure + Success,
prob = p_values
))
) %>%
unnest(c(p, likelihood)) %>%
pivot_wider(names_from = grp, values_from = likelihood)
# A tibble: 11 x 4
# p likelihood_1 likelihood_2 likelihood_3
# <dbl> <dbl> <dbl> <dbl>
# 1 0 0 0 0
# 2 0.1 0.027 0.00045 0.0230
# 3 0.2 0.096 0.0064 0.115
# 4 0.3 0.189 0.0284 0.227
# 5 0.4 0.288 0.0768 0.290
# 6 0.5 0.375 0.156 0.273
# 7 0.6 0.432 0.259 0.194
# 8 0.7 0.441 0.360 0.0972
# 9 0.8 0.384 0.410 0.0287
#10 0.9 0.243 0.328 0.00255
#11 1 0 0 0
I would actually switch into purrr
for this.我实际上会为此切换到
purrr
。 The function pmap()
will iterate by row.函数
pmap()
将逐行迭代。 We use ..1
and ..2
to signify the first and second inputs, respectively.我们使用
..1
和..2
分别表示第一个和第二个输入。 Using pmap_dfc()
will bind the results by columns (dfc = data frame columns).使用
pmap_dfc()
将按列(dfc = 数据框列)绑定结果。
library(purrr)
library(tibble)
df %>%
pmap_dfc(~ dbinom(..2, size = ..1 + ..2, prob = p_values)) %>%
set_names(paste0("likelihood_", seq_along(.))) %>%
add_column(p_values = p_values, .before = 1)
# A tibble: 11 x 4
p_values likelihood_1 likelihood_2 likelihood_3
<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0.1 0.027 0.00045 0.0230
3 0.2 0.096 0.0064 0.115
4 0.3 0.189 0.0284 0.227
5 0.4 0.288 0.0768 0.290
6 0.5 0.375 0.156 0.273
7 0.6 0.432 0.259 0.194
8 0.7 0.441 0.360 0.0972
9 0.8 0.384 0.410 0.0287
10 0.9 0.243 0.328 0.00255
11 1 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.