[英]Using map_dbl with a nested df not accessing the dataframe correctly?
I'm working on a project where I need to find the distance between a bunch of behaviors that are measured in 3-dimensional space and a pre-identified point in 3-dimensional space.我正在做一个项目,我需要找到在 3 维空间中测量的一系列行为与 3 维空间中预先确定的点之间的距离。 I wrote a function to calculate the distance between the point and a single behavior, which works when I apply it to only one behavior.我编写了一个函数来计算点与单个行为之间的距离,当我仅将其应用于一个行为时,该函数有效。 However, I need to apply it to ~750 behaviors in a larger data frame.但是,我需要将其应用于更大数据框中的约 750 种行为。 So I am hoping to nest the larger behaviors data frame by term and then apply the function to each one of those nested dataframes using map_dbl.因此,我希望逐项嵌套较大的行为数据帧,然后使用 map_dbl 将该函数应用于这些嵌套数据帧中的每一个。 However, I keep getting the error:但是,我不断收到错误消息:
Error: Problem with mutate()
column distance
.错误: mutate()
列distance
。 ℹ distance = map_dbl(data, calc_distance_from_beh)
. ℹ distance = map_dbl(data, calc_distance_from_beh)
。 x Join columns must be present in data. x 连接列必须存在于数据中。 x Problem with dim
. x dim
问题。 ℹ The error occurred in row 1. ℹ 错误发生在第 1 行。
It seems like something is happening when map_dbl is being applied to the nested dataframes where it isn't able to access the "dim" column to join on and I'm not sure why.当 map_dbl 被应用于无法访问“dim”列以加入的嵌套数据帧时,似乎发生了一些事情,我不知道为什么。
I've included a reproducible example below with just two behaviors.我在下面包含了一个只有两种行为的可重现示例。
Reproducible example:可重现的例子:
behaviors <- tibble(term = rep(c("abandon", "abet"), each = 3),
estimate = c(-3.31, -0.08, -0.11, 0.03, 0.34, -0.18),
dim = c("E", "P", "A", "E", "P", "A"))
optimal_behavior <- tibble(actor = "civil_engineer",
object = "civil_engineer",
opt_beh = c(1.905645, 0.9960085, -0.17772678),
dim = c("E", "P", "A"))
calc_distance_from_beh <- function(nested_df){
optimal_behavior <- as_tibble(optimal_behavior)
nested_df <- as_tibble(nested_df)
df_for_calculations <- left_join(optimal_behavior, nested_df, by = "dim")
df_for_calculations %>%
mutate(dist = (estimate-opt_beh)^2) %>%
summarise(total_dist = sum(dist)) %>%
pull()
}
behaviors_distance <- behaviors %>%
nest_by(term) %>%
mutate(distance = map_dbl(data, calc_distance_from_beh))
If the 'value' column is named as estimate
, just ungroup
after the nest_by
(because nest_by
creates a rowwise
attribute which prevents the map
to access each element)如果 'value' 列被命名为estimate
,只需在nest_by
之后ungroup
nest_by
(因为nest_by
创建一个rowwise
属性,它阻止map
访问每个元素)
library(purrr)
library(dplyr)
behaviors %>%
nest_by(term) %>%
ungroup %>%
mutate(distance = map_dbl(data, calc_distance_from_beh))
# A tibble: 2 × 3
term data distance
<chr> <list<tibble[,2]>> <dbl>
1 abandon [3 × 2] 28.4
2 abet [3 × 2] 3.95
Or instead of map
, we may directly apply the function in mutate
as it is rowwise
或者代替map
,我们可以直接在mutate
应用该函数,因为它是rowwise
behaviors %>%
nest_by(term) %>%
mutate(distance = calc_distance_from_beh(data)) %>%
ungroup
-output -输出
# A tibble: 2 × 3
term data distance
<chr> <list<tibble[,2]>> <dbl>
1 abandon [3 × 2] 28.4
2 abet [3 × 2] 3.95
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.