简体   繁体   English

将数据框连接到purrr :: map_ *中的嵌套数据框

[英]joining dataframe to nested dataframes within purrr::map_*

My aim is to join a dataframe to a dataframes held within a nested list-column, eg: 我的目标是将数据框与嵌套列表列中保存的数据框连接起来,例如:

data(mtcars)
library(tidyr)
library(purrr)

mtcars_nest <- mtcars %>% rownames_to_column() %>% rename(rowname_1 = rowname) %>% select(-mpg) %>% group_by(cyl) %>% nest()
mtcars_mpg <- mtcars %>% rownames_to_column() %>% rename(rowname_2 = rowname) %>% select(rowname_2, mpg)

join_df <- function(df_nest, df_other) {
  df_all <- df_nest %>% inner_join(df_other, by = c("rowname_1" = "rowname_2"))
}

join_df <- mtcars_nest %>%
  mutate(new_mpg = map_df(data, join_df(., mtcars_mpg)))

This returns the following error: 这将返回以下错误:

# Error in mutate_impl(.data, dots) : Evaluation error: `by` can't contain join column `rowname_1` which is missing from LHS.

So the dataframe map_* receives from the nested input isn't offering a column name (ie rowname_1 ) to take part in the join. 因此,从嵌套输入接收的map_*数据map_*没有提供要加入rowname_1的列名(即rowname_1 )。 I can't work out why this is the case. 我不知道为什么会这样。 I'm passing the data column that contains dataframes from the nested dataframe. 我正在传递包含来自嵌套数据框的数据框的data列。 I want a dataframe output that can be added to a new column in the input nested dataframe, eg 我想要一个数据框输出,可以将其添加到输入嵌套数据框的新列中,例如

| rowname_1 | cyl | disp |...|mpg|
|:----------|:----|:-----|:--|:--|

A couple things: 几件事:

  • you should use the tilde to functionize (in purrr ) the function argument to map* ; 您应该使用代字号对map*的函数参数进行函数化(在purrr ); and
  • I think you should be using map instead of map_df , and though I cannot find exactly why map_df doesn't work right, I can get what I think is your desired behavior without it. 我认为您应该使用map而不是map_df ,尽管我无法确切找到为什么 map_df无法正常工作的原因 ,但是我可以得到我认为没有它的期望行为。

Minor point: 次要点:

  • you assign to df_all within join_df() , and the only reason it is working is because that assignment invisibly returns what you assigned to df_all ; 您在join_df()分配给df_all ,并且它起作用的唯一原因是因为该分配无形地返回了您分配给df_all I suggest you should be explicit: either follow-up with return(df_all) or just don't assign it, end with inner_join(...) . 我建议您应该明确:要么跟进return(df_all)要么就不分配它,以inner_join(...)结尾。

Try this: 尝试这个:

library(tibble) # rownames_to_column
library(dplyr)
library(tidyr)  # nest
library(purrr)

join_df <- function(df_nest, df_other) {
  df_all <- inner_join(df_nest, df_other, by = c("rowname_1" = "rowname_2"))
  return(df_all)
}

mtcars_nest %>%
  mutate(new_mpg = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 3
#     cyl data               new_mpg           
#   <dbl> <list>             <list>            
# 1    6. <tibble [7 x 10]>  <tibble [7 x 11]> 
# 2    4. <tibble [11 x 10]> <tibble [11 x 11]>
# 3    8. <tibble [14 x 10]> <tibble [14 x 11]>

The new_mpg is effectively the data column with one additional column. new_mpg实际上是data列,其中包含另外一列。 Since we know that we have full redundancy, you can always over-write (or remove) data : 由于我们知道我们具有完全冗余,因此您始终可以覆盖(或删除) data

mtcars_nest %>%
  mutate(data = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 2
#     cyl data              
#   <dbl> <list>            
# 1    6. <tibble [7 x 11]> 
# 2    4. <tibble [11 x 11]>
# 3    8. <tibble [14 x 11]>

and get your nested and now augmented frames. 并获取嵌套的和现在增强的框架。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM