简体   繁体   中英

Converting a variable stored in a list into a list of character vectors in r

I have a subset of data that originates from a very large dataset. I have split this subset of data into a list of dataframes so that each case/id is a separate element within the list. Each element is named with the case/id. I then remove all variables from each dataframe element to be left with only one variable - called 'state'. It is currently a factor with 7 levels.

I am attempting to turn this list of 'state' elements into a list of character vectors. The element below is the first in the list, and included are the row numbers (which originate from the much larger original dataset).

[[1]]
        state
104246 active
104247   rest
104248 active
104249 active
.
.
.
104315 active
104316 active
104317   rest
104318   rest

I am trying to turn this simply into a character vector that would look like this:

[1] "active" "rest" "active" "active" ........... "active" "active" "rest" "rest"

It seems simple. I have tried doing things like (where 'temp' is the list name):

as.vector(as.matrix(temp))   

This returns something like this:

         [,1]  
    id1  List,1
    id2  List,1
    id3  List,1
    id4  List,1

When I look at each element from this they basically appear to be still in longform.

Alternatively, I tried directly converting to a character:

as.vector(as.character(temp))

But, this comes back as not the ideal format (though, I guess I could hack this to convert the factor level numbers to words... (note in the large dataset, there are 7 levels of the factor 'state')

[1] "list(state = c(1, 4, 1, 1, 1, 1, 1, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 6, 1, 4, 4, 1, 1, 1, 4,     1, 1, 1, 6, 4, 1, 1, 1, 1, 1, 4, 4, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1,     1, 1, 1, 4, 4))"

I also tried making the variable 'state' which is a factor a character variable prior to conversion, but that didn't help.

Here is the data for a reproducible example. It contains two elements in the list 'temp' only in this example:

temp<-list(structure(list(state = structure(c(1L, 4L, 1L, 1L, 1L, 1L, 
                                           1L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 
                                           6L, 1L, 4L, 4L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 6L, 4L, 1L, 1L, 1L, 
                                           1L, 1L, 4L, 4L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                           4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                           1L, 4L, 4L), .Label = c("active", "active2", "active3", "rest", "rest2", 
                                                                   "stop", "stop2"), class = "factor")), .Names = "state", row.names = 104246:104318, class = "data.frame"), 
        structure(list(state = structure(c(1L, 4L, 4L, 4L, 1L, 1L, 
                                           1L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 1L, 6L, 4L, 1L, 4L, 
                                           4L, 4L, 1L, 4L, 1L, 1L, 1L), .Label = c("active", "active2", 
                                                                                   "active3", "rest", "rest2", "stop", "stop2"), class = "factor")), .Names = "state", row.names = 950:977, class = "data.frame"))



str(temp)

This could be a good opportunity to use rapply :

x <- rapply(temp, as.character, how = "replace")
str(x)
# List of 2
#  $ :List of 1
#   ..$ state: chr [1:73] "active" "rest" "active" "active" ...
#  $ :List of 1
#   ..$ state: chr [1:28] "active" "rest" "rest" "rest" ...

If you wanted to flatten it out further, then you can use unlist(..., recursive = FALSE) .

str(unlist(rapply(temp, as.character, how = "replace"), recursive=FALSE))
# List of 2
#  $ state: chr [1:73] "active" "rest" "active" "active" ...
#  $ state: chr [1:28] "active" "rest" "rest" "rest" ...

This second approach would give you the same results as @Vlo's approach, but would be more efficient than it calls unlist just once. To see how different it could be, here's are some benchmarks on a larger list :

x <- replicate(1000, temp)   ## A larger list

## Vlo's approach
fun1 <- function() {
  lapply(x, function(y) as.character(unlist(y, use.names = FALSE)))
} 

## My approach
fun2 <- function() {
  unlist(rapply(x, as.character, how = "replace"), 
         recursive=FALSE, use.names=FALSE)
} 

## Benchmarking
library(microbenchmark)
microbenchmark(fun1(), fun2(), times = 50)
# Unit: milliseconds
#    expr       min        lq    median        uq       max neval
#  fun1() 435.84992 475.17146 497.63325 533.68488 1570.6814    50
#  fun2()  50.90449  55.79023  63.85908  70.78956  111.0357    50

## Comparison of results
all.equal(fun1(), fun2(), check.attributes=FALSE)
# [1] TRUE

试试这段代码

as.vector(unlist(temp[[1]]))

L = lapply(temp, function(x) as.character(unlist(x)))对于向量,只需L[[1]]L[[2]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM