简体   繁体   中英

Extract and organize data from subsetted lists on R

I spent the last days trying to solve it by myself using several different sources of information, including other questions here on Stackoverflow, but failed. I'm a complete beginner, so that's probably why I'm struggling so much with this.

I created these dummy data below to illustrate how my original data looks like.

list1<-list(path = ".../folder1/folder2/Country_State_Species_Individual1.png",
            matrix1 = cbind(1:3, 1:9),
            matrix2 = cbind(1:3, 1:9),
            matrix3 = cbind(1:3, 1:9))

list2<-list(path = ".../folder1/folder2/Country_State_Species_Individual2.png",
            matrix1 = cbind(1:3, 1:9),
            matrix2 = cbind(1:3, 1:9),
            matrix3 = cbind(1:3, 1:9))

list3<-list(path = ".../folder1/folder2/Country_State_Species_Individual3.png",
            matrix1 = cbind(1:3, 1:9),
            matrix2 = cbind(1:3, 1:9),
            matrix3 = cbind(1:3, 1:9))

general_list <- list(list1, list2, list3)

As you can see, it is a big list ( general_list ) composed by small lists ( list1 , list2 , list3 ) that are identical in structure.

My initial goal can be described in two steps:

1 – Sample 6 random rows from each matrix2 and save each of these outputs in a new object.

2 – Rename these objects using the information contained in the original file name stored in the path

I want to rename the extracted matrices this way because I need to be able to sort the matrices by the variables expressed in the file names (Country, State and especially Individuals). But maybe might be a more efficient/practical way to do this.

The most advisable way to store these new objects would be on a new list?

I would also be happy to receive any suggestions on how to achieve my initial goal and how to proceed in order to optimize the storage of these new objects (having in mind that they will be used in some analysis after everything is done).

Best regards!

We loop over the 'general_list', extract the matrix2 , then sample 6 rows from the dataset, create a new list ('out') and rename the list with the basename of the 'path' element

out <- lapply(general_list, function(x) {
     x1 <- x$matrix2
     x1[sample(nrow(x1), 6, replace = FALSE),] })
names(out) <- sapply(general_list,
     function(x) tools::file_path_sans_ext(basename(x$path)))
out
#$Country_State_Species_Individual1
#     [,1] [,2]
#[1,]    3    9
#[2,]    2    2
#[3,]    1    7
#[4,]    1    4
#[5,]    3    6
#[6,]    2    8

#$Country_State_Species_Individual2
#     [,1] [,2]
#[1,]    3    3
#[2,]    1    7
#[3,]    3    9
#[4,]    2    2
#[5,]    3    6
#[6,]    1    1

#$Country_State_Species_Individual3
#     [,1] [,2]
#[1,]    3    3
#[2,]    2    2
#[3,]    1    4
#[4,]    2    5
#[5,]    1    7
#[6,]    3    6

Or using tidyverse

library(dplyr)
library(purrr)
out <- map(general_list, ~  .x %>%
                             pluck('matrix2') %>%
                             as.data.frame %>%
                             sample_n(6) %>%
                             as.matrix)
names(out) <- map_chr(general_list, ~ 
               tools::file_path_sans_ext(basename(.x$path)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM