简体   繁体   中英

for loop with dplyr

I have a bunch of files I read in manually as such:

# gel above replicates

    A_gel <-read.delim("XL1_3_S35_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_gel <-read.delim("XL2_3_S37_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_gel <- read.delim("XL2_3_S37_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_gel <- read.delim("XL1_3_S35_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
# gel below replicates
    
    A_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")

I would like to change all the columns of these files and arrange by the start column with something like this:

colnames(A_gel) <- c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")
    
A_gel <- A_gel %>%
      arrange(A_gel$Start)

Instead, I would like to use a for loop for all files using R.

Never create multiple variables following the same pattern. The properly supported solution for this general problem is the use of lists (ie instead of having variables A_gel , B_gel , …, you have one variable gel , which is a list that contains your individual data.frame s; you can also assign names to these individual items, though in your case that doesn't seem necessary).

Then you can use eg lapply to run over your file paths and read the data of the different files into that list:

gel = lapply(gel_filenames, read.delim)
below_gel = lapply(below_gel_filenames, read.delim)

… and likewise you can put your arrangement code into a function and apply that, changing the above to:

read_bed = function (filename) {
    read.delim(filename) %>%
        setNames(c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")) %>%
        arrange(Start)
}

# …

gel = lapply(gel_filenames, read_bed)

Better yet, use purrr::map_dfr to read all data into a single combined table:

gel = gel_filenames %>%
    setNames(., .) %>%
    map_dfr(read_bed, .id = 'Filename')

(The setNames(., .) step is necessary since read_dfr assigns the names of the input vector to the added ID column.)

This will create one master table for the “GEL” dat, which has an added ID column for the original filename (you'll probably want to extract just some ID from that, using tidyr::extract ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM