從 R 中的 tidygraph 對象列表中刪除重復的元素？

Question

我有一個 tidygraph 對象的列表。 在節點數據中，我有兩列，即name和frequency 。 我想要做的是刪除任何重復多次的列表元素（即tidygraph 對象）。 希望我的例子可以解釋更多：

首先，我創建了一些節點/邊緣數據，將它們轉換為 tidygraph 對象並將它們放在一個列表中：

library(tidygraph)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)


# create some node and edge data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
                    val = c(1, 5, 2))
nodes2 <- data.frame(name = c("x4", NA, NA),
                     val = c(3, 2, 2))
nodes3 <- data.frame(name = c("x4", NA, NA),
                     val = c(5, 6, 7))
nodes4 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
                     val = c(3, 2, 2, 1, 1, 2, 7))
nodes5 <- data.frame(name= c("x1", "x2", NA),
                     val = c(7, 4, 2))
nodes6 <- data.frame(name = c("x1", "x2", NA),
                     val = c(2, 1, 3))

edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
                     to    = c(2, 3, 4, 5, 6, 7))

# create the tbl_graphs
tg   <- tbl_graph(nodes = nodes,  edges = edges)
tg_1 <- tbl_graph(nodes = nodes2, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes4, edges = edges1)
tg_4 <- tbl_graph(nodes = nodes5, edges = edges)
tg_5 <- tbl_graph(nodes = nodes6, edges = edges)


# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)

然后，我有這個小 function，它根據name列告訴我每個列表元素的頻率。 也就是說，如果列name在多個列表元素中重復/相同，則頻率會增加。 因此，在我上面的示例中， tg中的name列在我的列表中出現了 3 次（在tg 、 tg_1和tg_2中相同）......所以它的頻率為 3。

然后，我向每個列表元素添加一個frequency列，並更改我原來myList object。 例如：

freqs <- lapply(myList, function(x){
  x %>% 
    pull(name) %>%
    replace_na("..") %>%
    paste0(collapse = "")
}) %>%
  unlist(use.names = F) %>%
  as_tibble() %>%
  group_by(value) %>%
  mutate(val = n():1) %>%
  pull(val)
  
  

newList <- purrr::imap(myList, ~.x %>% 
              mutate(frequency = freqs[.y]) %>% 
              select(name, frequency))

現在查看newList返回：

> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            3
2 NA            3
3 NA            3
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[2]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            2
2 NA            2
3 NA            2
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            1
2 NA            1
3 NA            1
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[4]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            1
2 x2            1
3 NA            1
4 NA            1
5 x1            1
6 NA            1
# … with 1 more row
#
# Edge Data: 6 × 2
   from    to
  <int> <int>
1     1     2
2     2     3
3     2     4
# … with 3 more rows

[[5]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x1            2
2 x2            2
3 NA            2
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[6]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x1            1
2 x2            1
3 NA            1
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

所以我們可以看到帶有x4, NA, NA的name列出現了 3 次......但不是每次都添加頻率......我似乎在倒數頻率（不是故意的）......所以， x4, NA, NA說它的頻率是 3，然后是 2，然后是 1。

我正在嘗試刪除任何重復的列表元素並僅保留頻率最高的元素。 例如，我想要的 output 看起來像：

> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            3
2 NA            3
3 NA            3
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
  name  frequency
  <chr>     <int>
1 x4            1
2 x2            1
3 NA            1
4 NA            1
5 x1            1
6 NA            1
# … with 1 more row
#
# Edge Data: 6 × 2
   from    to
  <int> <int>
1     1     2
2     2     3
3     2     4
# … with 3 more rows

[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <int>
1 x1            2
2 x2            2
3 NA            2
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

在這里，我們可以看到具有重復頻率的元素已被刪除......關於我如何做到這一點的任何建議？

Answer 1

對原始答案的評論將是改變答案的充分動力。 也就是說，通過對分組的第一個 tibble 進行slice來稍微更新代碼，可能像這樣：

library(tidygraph) ; library(tidyverse)
freqs <- map(myList, function(x){
  x %>% 
    pull(name) %>%
    replace_na("..") %>%
    paste0(collapse = "")
}) %>%
  unlist(use.names = F) %>%
  as_tibble() %>%
  mutate(ids = 1:n()) %>%
  group_by(value) %>%
  mutate(val = n():1)

ids <- freqs %>% slice(1) %>% pull(ids)
freqs <- freqs %>% pull(val)

newList <- purrr::imap(myList, ~.x %>% 
                         mutate(frequency = freqs[.y]) %>% 
                         select(name, frequency))

newList[sort(ids)]

[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
  name  frequency
  <chr>     <int>
1 x4            3
2 NA            3
3 NA            3
#
# Edge Data: 2 x 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 x 2 (active)
  name  frequency
  <chr>     <int>
1 x4            1
2 x2            1
3 NA            1
4 NA            1
5 x1            1
6 NA            1
# ... with 1 more row
#
# Edge Data: 6 x 2
   from    to
  <int> <int>
1     1     2
2     2     3
3     2     4
# ... with 3 more rows

[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
  name  frequency
  <chr>     <int>
1 x1            2
2 x2            2
3 NA            2
#
# Edge Data: 2 x 2
   from    to
  <int> <int>
1     1     2
2     1     3

從 R 中的 tidygraph 對象列表中刪除重復的元素？

問題描述

1 個解決方案

解決方案1
0 2021-11-19 15:12:44

從 R 中的 tidygraph 對象列表中刪除重復的元素？

問題描述

1 個解決方案

解決方案1 0 2021-11-19 15:12:44

解決方案1
0 2021-11-19 15:12:44