简体   繁体   English

使用带有嵌套 df 的 map_dbl 无法正确访问数据帧?

[英]Using map_dbl with a nested df not accessing the dataframe correctly?

I'm working on a project where I need to find the distance between a bunch of behaviors that are measured in 3-dimensional space and a pre-identified point in 3-dimensional space.我正在做一个项目,我需要找到在 3 维空间中测量的一系列行为与 3 维空间中预先确定的点之间的距离。 I wrote a function to calculate the distance between the point and a single behavior, which works when I apply it to only one behavior.我编写了一个函数来计算点与单个行为之间的距离,当我仅将其应用于一个行为时,该函数有效。 However, I need to apply it to ~750 behaviors in a larger data frame.但是,我需要将其应用于更大数据框中的约 750 种行为。 So I am hoping to nest the larger behaviors data frame by term and then apply the function to each one of those nested dataframes using map_dbl.因此,我希望逐项嵌套较大的行为数据帧,然后使用 map_dbl 将该函数应用于这些嵌套数据帧中的每一个。 However, I keep getting the error:但是,我不断收到错误消息:

Error: Problem with mutate() column distance .错误: mutate()distance distance = map_dbl(data, calc_distance_from_beh) . distance = map_dbl(data, calc_distance_from_beh) x Join columns must be present in data. x 连接列必须存在于数据中。 x Problem with dim . x dim问题。 ℹ The error occurred in row 1. ℹ 错误发生在第 1 行。

It seems like something is happening when map_dbl is being applied to the nested dataframes where it isn't able to access the "dim" column to join on and I'm not sure why.当 map_dbl 被应用于无法访问“dim”列以加入的嵌套数据帧时,似乎发生了一些事情,我不知道为什么。

I've included a reproducible example below with just two behaviors.我在下面包含了一个只有两种行为的可重现示例。

Reproducible example:可重现的例子:

behaviors <- tibble(term = rep(c("abandon", "abet"), each = 3),
                   estimate = c(-3.31, -0.08, -0.11, 0.03, 0.34, -0.18),
                   dim = c("E", "P", "A", "E", "P", "A"))

optimal_behavior <- tibble(actor = "civil_engineer",
                          object = "civil_engineer",
                          opt_beh = c(1.905645, 0.9960085, -0.17772678),
                          dim = c("E", "P", "A"))


calc_distance_from_beh <- function(nested_df){
  
      optimal_behavior <- as_tibble(optimal_behavior)
      nested_df <- as_tibble(nested_df)
      
      df_for_calculations <- left_join(optimal_behavior, nested_df, by = "dim")
      
      df_for_calculations %>% 
            mutate(dist = (estimate-opt_beh)^2) %>% 
            summarise(total_dist = sum(dist)) %>% 
        pull()
}


behaviors_distance <- behaviors %>% 
                      nest_by(term) %>% 
                      mutate(distance = map_dbl(data, calc_distance_from_beh))

If the 'value' column is named as estimate , just ungroup after the nest_by (because nest_by creates a rowwise attribute which prevents the map to access each element)如果 'value' 列被命名为estimate ,只需在nest_by之后ungroup nest_by (因为nest_by创建一个rowwise属性,它阻止map访问每个元素)

library(purrr)
library(dplyr)
behaviors %>% 
          nest_by(term) %>% 
          ungroup %>%
          mutate(distance = map_dbl(data, calc_distance_from_beh))
# A tibble: 2 × 3
  term                  data distance
  <chr>   <list<tibble[,2]>>    <dbl>
1 abandon            [3 × 2]    28.4 
2 abet               [3 × 2]     3.95

Or instead of map , we may directly apply the function in mutate as it is rowwise或者代替map ,我们可以直接在mutate应用该函数,因为它是rowwise

behaviors %>%
    nest_by(term) %>%
    mutate(distance = calc_distance_from_beh(data)) %>%
    ungroup

-output -输出

# A tibble: 2 × 3
  term                  data distance
  <chr>   <list<tibble[,2]>>    <dbl>
1 abandon            [3 × 2]    28.4 
2 abet               [3 × 2]     3.95

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM