简体   繁体   中英

R - Name concatenation separator in unlist() (flattening list of lists)

It is possible to flatten lists of lists using unlist(list, recursive = FALSE) , as was shown in this question. This action concatenates list names using the default dot ( . ) separator, which is standard for variable naming in R. A simple example illustrates this:

# Create example list, l
> l <- list("a" = list("x" = 1, "y" = 2), "b" = list("x" = 3, "y" = 4))

> l
$a
$a$x
[1] 1

$a$y
[1] 2


$b
$b$x
[1] 3

$b$y
[1] 4

# Unlist lists in l
> l.unlisted <- unlist(l, recursive = FALSE)

> l.unlisted
$a.x
[1] 1

$a.y
[1] 2

$b.x
[1] 3

$b.y
[1] 4

In spite of the standard naming convention, I want the names to have a different separator ( _ ). It is possible to do this through string manipulation by using sub to find and replace the default . separator in each name after concatenation has already taken place once in unlist() , as follows:

> names(l.unlisted) <- sub('.', '_', names(l.unlisted), fixed=TRUE)

> l.unlisted
$a_x
[1] 1

$a_y
[1] 2

$b_x
[1] 3

$b_y
[1] 4

While this would be sufficient in most situations, I think that the extra concatenation step can be eliminated by altering the default separator used by unlist() . I hypothesize that this can be done by altering the source code of the function using fix() by adding a sep argument similar to the the one used in paste() . However, I do not know how to do so, as unlist() is an internal function.

Is there a way to alter the default name concatenation separator in unlist() , and how can this be done?

While one can search replace dots as suggesting in comment by akrun, this is a hack solution that doesn't necessarily work if there's dots in the names already. Here's a more robust solution.

An example list:

ex_list = list(
  a = c(x1=1, x2=2, x3=3),
  b = c(y1=1, y2=2),
  c = c(z1=1)
)

looks like:

> ex_list
$a
x1 x2 x3 
 1  2  3 

$b
y1 y2 
 1  2 

$c
z1 
 1 

The usual approaches:

> #tries
> unlist(ex_list)
a.x1 a.x2 a.x3 b.y1 b.y2 c.z1 
   1    2    3    1    2    1 
> do.call(what = c, args = ex_list)
a.x1 a.x2 a.x3 b.y1 b.y2 c.z1 
   1    2    3    1    2    1 
> unlist(unname(ex_list))
x1 x2 x3 y1 y2 z1 
 1  2  3  1  2  1 

First two joins using dot ( . ) separator, third uses no prefix (useful in some cases).

A function:

#with custom separator
unlist2 = function(x, sep = "_") {
  #save top names
  top_names = names(x)
  x = unname(x)

  #flatten
  x2 = unlist(x)

  #add prefix
  #determine how many prefixes to add of each
  lengths_top = sapply(x, length)
  prefixes = rep(top_names, times = lengths_top)
  names(x2) = paste0(prefixes, sep, names(x2))

  x2
}

Test it:

> #tests
> unlist2(ex_list)
a_x1 a_x2 a_x3 b_y1 b_y2 c_z1 
   1    2    3    1    2    1 
> unlist2(ex_list, sep = "-")
a-x1 a-x2 a-x3 b-y1 b-y2 c-z1 
   1    2    3    1    2    1 

Base R unlist()

The base R function calls .Internal , so we can't modify it easily:

> unlist
function (x, recursive = TRUE, use.names = TRUE) 
{
    if (.Internal(islistfactor(x, recursive))) {
        lv <- unique(.Internal(unlist(lapply(x, levels), recursive, 
            FALSE)))
        nm <- if (use.names) 
            names(.Internal(unlist(x, recursive, use.names)))
        res <- .Internal(unlist(lapply(x, as.character), recursive, 
            FALSE))
        res <- match(res, lv)
        structure(res, levels = lv, names = nm, class = "factor")
    }
    else .Internal(unlist(x, recursive, use.names))
}
<bytecode: 0x558a410998b0>
<environment: namespace:base>

According to docs for .Internal :

Only true R wizards should even consider using this function, and only R developers can add to the list of internal functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM