简体   繁体   English

获取连接到R中的主要父节点的节点集

[英]Getting the set of nodes connected till the main parent node in R

I have a data set which has 6 rows and 3 columns. 我有一个包含6行3列的数据集。 The first column represents children, whereas second column onward immediate parents of the corresponding child is allocated. 第一列代表孩子,而第二列向前分配相应孩子的直系父母。 在此处输入图片说明

Above, one can see that "a" and "b" don't have any parents. 在上面,可以看到“ a”和“ b”没有任何父母。 whereas "c" has only parent and that is "a". 而“ c”只有一个父对象,即“ a”。 "d" has parents "b" and "c" and so on. “ d”具有父母“ b”和“ c”,依此类推。

What I need is: if given the input as the child, it should give me all the ancestors of that child including child. 我需要的是:如果以孩子的身份输入,则应该给我该孩子的所有祖先,包括孩子。

eg "f" is the child I chose then desired output should be : {"f", "d", "b"}, {"f", "d", "c", "a"}, {"f", "e", "b"}, {"f", "e", "c", "a"}. 例如,“ f”是我选择的孩子,则期望的输出应为:{“ f”,“ d”,“ b”},{“ f”,“ d”,“ c”,“ a”},{“ f “,” e“,” b“},{” f“,” e“,” c“,” a“}。

Note: Order of the nodes does not matter. 注意:节点的顺序无关紧要。

Thank you so much in advance. 提前非常感谢您。

Create sample data. 创建样本数据。 Note use of stringsAsFactors here, I'm assuming your data are characters and not factors: 注意这里使用stringsAsFactors ,我假设您的数据是字符而不是因素:

> d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")),stringsAsFactors=FALSE)

First tidy it up - make the data long, not wide, with each row being a child-parent pair: 首先整理一下-使数据变长而不是变宽,每一行都是一对子对:

> pairs = subset(reshape2::melt(d,id.vars="c",value.name="parent"), !is.na(parent))[,c("c","parent")]
> pairs
   c parent
3  c      a
4  d      b
5  e      b
6  f      d
10 d      c
11 e      c
12 f      e

Now we can make a graph of the parent-child relationships. 现在我们可以绘制父子关系图。 This is a directed graph, so plots child-parent as an arrow: 这是一个有图,因此将父级绘制为箭头:

> g = graph.data.frame(pairs)
> plot(g)

在此处输入图片说明

Now I'm not sure exactly what you want, but igraph functions can do anything... So for example, here's a search of the graph starting at d from which we can get various bits of information: 现在我不确定您到底想要什么,但是igraph函数可以执行任何操作...因此,例如,从d开始搜索图,从中我们可以获取各种信息:

> d_search = bfs(g,"d",neimode="out", unreachable=FALSE, order=TRUE, dist=TRUE)

First, which nodes are ancestors of d ? 首先, d祖先是哪些节点? Its the ones that can be reached from d via the exhaustive (here, breadth-first) search: 可以通过详尽的搜索(此处为广度优先)从d到达它:

> d_search$order
+ 6/6 vertices, named:
[1] d    c    b    a    <NA> <NA>

Note it includes d as well. 注意它也包括d Trivial enough to drop from this list. 小到足以从该列表中删除。 That gives you the set of ancestors of d which is what you asked for. 这样就可以得到d的祖先集。

What is the relationship of those nodes to d ? 这些节点与d什么关系?

> d_search$dist
  c   d   e   f   a   b 
  1   0 NaN NaN   2   1

We see that e and f are unreachable, so are not ancestors of d . 我们看到ef是不可访问的,因此不是d祖先。 c and b are direct parents, and a is a grandparent. cb是直接父母, a是祖父母。 You can check this from the graph. 您可以从图表中进行检查。

You can also get all the paths from any child upwards using functions like shortest_paths and so on. 您也可以从使用向上类似功能的任何孩子所有的路径shortest_paths等。

Here is a recursive function that makes all possible family lines: 这是一个使所有可能的族系均行的递归函数:

d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), 
      "p1" = c(NA, NA, "a", "b", "b", "d"), 
      "p2" = c(NA, NA, NA, "c", "c", "e")), stringsAsFactors = F)

# Make data more convenient for the task.
library(reshape2)
dp <-  melt(d, id = c("c"), value.name = "p") 

# Recursive function builds ancestor vectors.
getAncestors <- function(data, x, ancestors = list(x)) {

  parents <- subset(data, c %in% x & !is.na(p), select = c("c", "p"))

  if(nrow(parents) == 0) {
    return(ancestors)
  }

  x.c <- parents$c
  p.c <- parents$p

  ancestors <- lapply(ancestors, function(x) {
    if (is.null(x)) return(NULL)

    # Here we want to repeat ancestor chain for each new parent.
    res <- list()
    matches <- 0
    for (i in 1:nrow(parents)) {
      if (tail(x, 1) == parents[i, ]$c){
       res[[i]] <- c(x, parents[i, ]$p)
       matches <- matches + 1
      }
    }

    if (matches == 0) { # There are no more parents. 
      res[[1]] <- x
    }

    return (res)
  })

  # remove one level of lists.
  ancestors <- unlist(ancestors, recursive = F)

  res <- getAncestors(data, p.c, ancestors)
  return (res)

}

# Demo of results for the lowest level.
res <- getAncestors(dp, "f")
res
#[[1]]
#[1] "f" "d" "b"

#[[2]]
#[1] "f" "d" "c" "a"

#[[3]]
#[1] "f" "e" "b"

#[[4]]
#[1] "f" "e" "c" "a"

You will need to implement this in a similar way through recursion or with a while loop. 您将需要通过递归或while循环以类似的方式实现此目的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM