简体   繁体   中英

Data Wrangling in r

I'm trying to subset my data into new data frames in order to perform my analysis.

I have data frames on 134 samples which contain a lot of information however I'm only interested in the type, name, and expression columns. How can I make a loop (or some way of iterating) through all of these samples to subset them into my columns of interest and then collate these into a single data frame?

I specifically would like to subset the data by type into separate tables (i,e one for proteins and one for mRNA). Then I would like to include the protein (or mRNA) expression of all of the samples into a single data frame.

Any help or direction to further resources would be much appreciated! Thanks

Example code

sample1 <- data.frame(type = c("protein", "mRNA"),
                      name = c("DIABLO", "X1345"),
                      expression = c("1.23", "4.265"))

sample2 <- data.frame(type = c("protein", "mRNA"),
                      name = c("DIABLO", "X1345"),
                      expression = c("3.24", "5.33"))

sample3 <- data.frame(type = c("protein", "mRNA"),
                      name = c("DIABLO", "X1345"),
                      expression = c("2.56", "8.11"))

Combine all the data together in one dataframe and split on type column to get two separate dataframes.

library(dplyr)

bind_rows(mget(paste0('sample', 1:3)), .id = 'sample') %>%
  split(.$type) %>%
  list2env(.GlobalEnv)

mRNA

#   sample type  name expression
#2 sample1 mRNA X1345      4.265
#4 sample2 mRNA X1345       5.33
#6 sample3 mRNA X1345       8.11

protein
#   sample    type   name expression
#1 sample1 protein DIABLO       1.23
#3 sample2 protein DIABLO       3.24
#5 sample3 protein DIABLO       2.56

Although you might not need it but I added an extra column ( sample ) to identify from which dataframe each row is coming from.

First read all of the data files into a list and then extract the parts:

samples <- list(sample1, sample2, sample3)
# Alternate approach
# samples <- lapply(paste0("sample", 1:3), get)
proteins <- do.call(rbind, lapply(samples, function(x) x[1, ]))
proteins
#      type   name expression
# 1 protein DIABLO       1.23
# 2 protein DIABLO       3.24
# 3 protein DIABLO       2.56
mRNAs <- do.call(rbind, lapply(samples, function(x) x[2, ]))
mRNAs
#    type  name expression
# 2  mRNA X1345      4.265
# 21 mRNA X1345       5.33
# 22 mRNA X1345       8.11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM