简体   繁体   中英

Separating Data using \n AND \t

I am trying to separate data into columns using "\\n" in rstudio and then separate that data further into rows using "\\t". So far I have been able to separate the data by "\\n", but I can't figure out how to further split the data by "\\t". I can't find any header names in the data I am using since its a table that I downloaded from the msigdb website. Here's what I have so far: matrix_sep_by_enter<-read.table("msigdb.v5.2.symbols.txt",sep = "\\n")

how do I further separate this using "\\t"

Thank you!

I'm not entirely sure how you want to parse the MSigDB. I've downloaded the latest MSigDB GMT file, so I'll show you a possibility based on that file.

  1. Read GMT file.

     df <- read.table("msigdb.v6.1.symbols.gmt", sep = "\\n"); 

    This creates a data.frame with one column and as many rows as there are lines in the GMT file.

  2. Split every line into substrings based on "\\t"

     lst <- apply(df, 1, function(x) unname(unlist(strsplit(x, "\\t")))); 

    The result is stored in a list of character vectors (of different lengths), where the first entry gives the gene set name, the second entry the MSigDB gene set weblink, and the remaining entries are the gene symbols associated with that gene set.

     str(lst, list.len = 5); #List of 17786 # $ : chr [1:195] "AAANWWTGC_UNKNOWN" "http://www.broadinstitute.org/gsea/msigdb/cards/AAANWWTGC_UNKNOWN" "MEF2C" "ATP1B1" ... # $ : chr [1:376] "AAAYRNCTG_UNKNOWN" "http://www.broadinstitute.org/gsea/msigdb/cards/AAAYRNCTG_UNKNOWN" "LTBP1" "PLEKHM1" ... # $ : chr [1:267] "MYOD_01" "http://www.broadinstitute.org/gsea/msigdb/cards/MYOD_01" "KCNE1L" "FAM126A" ... # $ : chr [1:255] "E47_01" "http://www.broadinstitute.org/gsea/msigdb/cards/E47_01" "MLIP" "FAM126A" ... # $ : chr [1:251] "CMYB_01" "http://www.broadinstitute.org/gsea/msigdb/cards/CMYB_01" "FAM126A" "C5orf64" ... # [list output truncated] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM