简体   繁体   English

R dplyr:如何使用组信息和编码列表中特定列的缺失条目向 df 添加额外的行?

[英]R dplyr: How to add extra rows to a df using group info and missing entries of a particular column from a codelist?

I have:我有:

group团体 items项目 value价值
grp1 grp1 A一个 1 1
grp1 grp1 B 2 2
grp2 grp2 B 3 3

I want:我想:

group团体 items项目 value价值
grp1 grp1 A一个 1 1
grp1 grp1 B 2 2
grp1 grp1 C C NA不适用
grp2 grp2 A一个 NA不适用
grp2 grp2 B 3 3
grp2 grp2 C C NA不适用

"group" is taken from the input df. “组”取自输入 df。 "items" is taken from a codelist vector with all possible entries, all other columns are filled in where known or else NA. “项目”取自具有所有可能条目的代码列表向量,所有其他列在已知或不适用的地方填写。

Example:例子:

item_codelist <- c("A", "B", "C")

input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

I looked into fill(), extend() and complete() but could not get any of these to work for this purpose.我查看了 fill()、extend() 和 complete(),但无法让其中任何一个用于此目的。

Below is my current workaround but I find it somewhat complicated and I am using a for loop which will take forever for my 200 MB data frame...以下是我当前的解决方法,但我发现它有点复杂,我正在使用一个 for 循环,这将永远花费我的 200 MB 数据帧......

If you know an easier way to do this (preferably in dplyr syntax) let me know.如果您知道一种更简单的方法(最好使用 dplyr 语法),请告诉我。 Thanks!谢谢!


# create a data frame with all groups and items
codelist_df <- input %>% head(0) %>% select(group, items)
for (grp in unique(input$group)){
  df <- data.frame("items" = item_codelist) %>%
    mutate( group = grp, .before = 1)
  codelist_df <- bind_rows(codelist_df, df)
}

# join that data frame to the input data
output <- input %>%
  group_by(group) %>%
  full_join(codelist_df) %>%
  arrange(group, items)

Stefan's comment is by far the best solution, which I was unaware of, but here's one option: Stefan 的评论是迄今为止最好的解决方案,我不知道,但这里有一个选择:

library(dplyr)
library(purrr)
library(tidyr)

input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

items <- c("A", "B", "C") 

input %>% 
  split(.$group) %>% 
  map_df(~full_join(., as_tibble(items), by = c("items" = "value")) %>% 
           arrange(items)) %>% 
  fill(group, .direction = 'down')
#>   group items values
#> 1  grp1     A      1
#> 2  grp1     B      2
#> 3  grp1     C     NA
#> 4  grp1     A     NA
#> 5  grp2     B      3
#> 6  grp2     C     NA

It seemse like you want to cross join the groups and items.似乎您想交叉加入组和项目。 To do that, you could use dplyr::full_join() with the argument by = character() , and then left join the values back in:为此,您可以将dplyr::full_join()与参数by = character()一起使用,然后将值重新加入:

library(dplyr, warn.conflicts = FALSE)

item_codelist <- tibble(items = c('A', 'B', 'C'))

groups <- tibble(group = c('grp1', 'grp1', 'grp2'))

input <- tibble("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

item_codelist |> 
  full_join(groups, by = character()) |> 
  left_join(input, by = c('items', 'group')) |> 
  relocate(group) |> 
  arrange(group, items) |> 
  distinct()

#> # A tibble: 6 × 3
#>   group items values
#>   <chr> <chr>  <dbl>
#> 1 grp1  A          1
#> 2 grp1  B          2
#> 3 grp1  C         NA
#> 4 grp2  A         NA
#> 5 grp2  B          3
#> 6 grp2  C         NA

Created on 2022-07-11 by the reprex package (v2.0.1)reprex 包于 2022-07-11 创建 (v2.0.1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM