简体   繁体   English

使用\\ n和\\ t分隔数据

[英]Separating Data using \n AND \t

I am trying to separate data into columns using "\\n" in rstudio and then separate that data further into rows using "\\t". 我试图在rstudio中使用“ \\ n”将数据分成几列,然后再使用“ \\ t”将该数据进一步分成几行。 So far I have been able to separate the data by "\\n", but I can't figure out how to further split the data by "\\t". 到目前为止,我已经能够通过“ \\ n”分隔数据,但是我无法弄清楚如何通过“ \\ t”进一步分隔数据。 I can't find any header names in the data I am using since its a table that I downloaded from the msigdb website. 我在使用的数据中找不到任何标题名称,因为它是我从msigdb网站下载的表格。 Here's what I have so far: matrix_sep_by_enter<-read.table("msigdb.v5.2.symbols.txt",sep = "\\n") 这是我到目前为止的内容: matrix_sep_by_enter<-read.table("msigdb.v5.2.symbols.txt",sep = "\\n")

how do I further separate this using "\\t" 如何使用“ \\ t”进一步分开

Thank you! 谢谢!

I'm not entirely sure how you want to parse the MSigDB. 我不确定您要如何解析MSigDB。 I've downloaded the latest MSigDB GMT file, so I'll show you a possibility based on that file. 我已经下载了最新的MSigDB GMT文件,因此我将向您展示基于该文件的可能性。

  1. Read GMT file. 读取GMT文件。

     df <- read.table("msigdb.v6.1.symbols.gmt", sep = "\\n"); 

    This creates a data.frame with one column and as many rows as there are lines in the GMT file. 这将创建一个data.frame ,其中包含一列和GMT文件中的行数。

  2. Split every line into substrings based on "\\t" 根据"\\t"每一行拆分为子字符串

     lst <- apply(df, 1, function(x) unname(unlist(strsplit(x, "\\t")))); 

    The result is stored in a list of character vectors (of different lengths), where the first entry gives the gene set name, the second entry the MSigDB gene set weblink, and the remaining entries are the gene symbols associated with that gene set. 结果存储在字符向量list中(长度不同),其中第一个条目给出了基因集名称,第二个条目给出了MSigDB基因集网页链接,其余条目是与该基因集相关的基因符号。

     str(lst, list.len = 5); #List of 17786 # $ : chr [1:195] "AAANWWTGC_UNKNOWN" "http://www.broadinstitute.org/gsea/msigdb/cards/AAANWWTGC_UNKNOWN" "MEF2C" "ATP1B1" ... # $ : chr [1:376] "AAAYRNCTG_UNKNOWN" "http://www.broadinstitute.org/gsea/msigdb/cards/AAAYRNCTG_UNKNOWN" "LTBP1" "PLEKHM1" ... # $ : chr [1:267] "MYOD_01" "http://www.broadinstitute.org/gsea/msigdb/cards/MYOD_01" "KCNE1L" "FAM126A" ... # $ : chr [1:255] "E47_01" "http://www.broadinstitute.org/gsea/msigdb/cards/E47_01" "MLIP" "FAM126A" ... # $ : chr [1:251] "CMYB_01" "http://www.broadinstitute.org/gsea/msigdb/cards/CMYB_01" "FAM126A" "C5orf64" ... # [list output truncated] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM