简体   繁体   中英

igraph graph.data.frame silently converts factors to character vectors

Today I learned that igraph silently loses factors on graph.data.frame, so factors in the vertex data frame are converted to character vectors. Is there a way to retain the factor type eg for V(g)$factor_var and df <- get.data.frame(g, what="vertices"); df$factor_var df <- get.data.frame(g, what="vertices"); df$factor_var ? In the following code, gender is the factor_var :

actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
                     age=c(48,33,45,34,21),
                     gender=factor(c("F","M","F","M","F")))
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
                               "David", "Esmeralda"),
                        to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
                        same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
                        friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph.data.frame(relations, directed=TRUE, vertices=actors)
g_actors <- get.data.frame(g, what="vertices")

# Compare type of gender (before and after)
is.factor(actors$gender)
is.factor(g_actors$gender)

In this reproducible example, actors$gender is a factor but g_actors$gender is not. In my opinion, it should be. I found no comment about this issue in the documentation.

This is important because exporting vertices via get.data.frame for linear regression looses factors (linear regression converts factors to dummy variables, but ignores character vectors). I noticed because my factor variables disappeared in the output.

Of course, I can recreate the factors after exporting from igraph, but this is tedious because I have a lot of graphs and the level ordering is all wrong (and I do not believe it should be necessary, unless igraph cannot support this behavior across its C++ and python versions).

Ryan

Yes, graph.data.frame has

newval <- d[, i]
if (class(newval) == "factor") {
  newval <- as.character(newval)
}
attrs[[names(d)[i]]] <- newval

so it converts factors to characters. I am not sure why, but it has been there forever: https://github.com/igraph/igraph/blame/c5849a89739c0dd058ff0a770aff2443745636fa/interfaces/R/igraph/R/structure.generators.R#L602

As a workaround, you can create a copy of the function, under a different name, and remove these three lines.

If you think that this is a bug, then please also open an issue at https://github.com/igraph/igraph/issues and I'll add an option not too convert. I think the default will still be to convert, just because it has been there for a long time, and people might rely on it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM