简体   繁体   中英

Convert R data frame into list of vectors

I have a data frame (imported from an Excel worksheet where I have written a lists of strings row by row) and want to convert the rows into a list of vectors where each vector contains the non-missing cell values for that row:

eg:

#Sample data frame
dfX <- data.frame(C0 = c(1,2,3),
              C1 = c("Apple","Apple","Pear"),
              C2 = c("Banana","Orange", "Lemon"),
              C3 = c("Pear","Melon", ""))

Which would be used to generate the following list:

myList = list(c("Apple","Banana", "Pear"),
          c("Apple","Orange", "Melon"),
          c("Pear","Lemon"))

Note the third vector is truncated to two elements as the cell contains an empty string. Also note that the index (C0) is dropped.

I have seen some examples which convert the data frame to a matrix and use the split function to then paste the results into the global environment, eg

list2env(setNames(split(as.matrix(dfX),
                    row(dfX)), paste0("Row",1:3)),
                    envir=.GlobalEnv)

But I was wondering if there were (a) a newer tidyverse function for handling this and (b) a way to populate straight to a list (I later want to lapply a function against that list). Also want the missing values handling on the way into the list if possible!

As you are interested in tidyverse way, one option would be

library(tidyverse)

dfX %>%
  group_split(C0) %>% #Or use split(.$C0) if `dplyr` is not updated
  map(~discard(flatten_chr(.), . == "")[-1])

#[[1]]
#[1] "Apple"  "Banana" "Pear"  

#[[2]]
#[1] "Apple"  "Orange" "Melon" 

#[[3]]
#[1] "Pear"  "Lemon"

group_split is available in dplyr 0.8.0 . Also this assumes that you would have unique C0 in every row and for every row we discard any value which is equal to empty strings ("").


Or in base R combination of split and lapply would also work.

lapply(split(dfX[-1], dfX$C0), function(x) x[x != ""])

#$`1`
#[1] "Apple"  "Banana" "Pear"  

#$`2`
#[1] "Apple"  "Orange" "Melon" 

#$`3`
#[1] "Pear"  "Lemon"

Another base R option is apply with MARGIN = 1

apply(dfX[-1], 1, function(x) x[x!= ""])

A base R option is by

by(dfX, dfX$C0, function(x) unlist(x[x != ''][-1]))
#dfX$C0: 1
#[1] "Apple"  "Banana" "Pear"
#------------------------------------------------------------
#dfX$C0: 2
#[1] "Apple"  "Orange" "Melon"
#------------------------------------------------------------
#dfX$C0: 3
#[1] "Pear"  "Lemon"

by returns a "dressed" list, ignoring the attributes this is the same as your expected myList .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM