简体   繁体   English

将data.frame拆分为2列

[英]Splitting the data.frame into 2 columns

I have a fasta file and I read the fasta file using "read.delim" into R. The corresponding data.frame looks like following: 我有一个fasta文件,我使用“read.delim”读入了Fasta文件到R.相应的data.frame如下所示:

>tm_sd_1256_2_1
MJAKDHRZTASDJASJDKASJDURUJDFLSDJFSDIFJKSDFKSJDFLJSDLFD
ASDJASDJ
>tm_sd_5672_1_2
AIZZTQBCSKLKDSHDADBCMSJHKQUWIRJHJJKKDLJSGDHASGDZGDHGHAGSDZASDASDVASGASDHGCAHGS
SADASDA[sample.fasta file][1]
>tm_sd_543_1_2
MUZTREQWERNBVXCYMNMVHZTOPOPOEURDASDOPOQWEUZQUIZRZIRIEIWUEWASDHASHDAHSDHAKHHSDHASHDJASHDAHUWIEUROWUOERUOWEUROOWWWW
>tm_sd_212_0_2
MTZTPSPASDASZDATSZGZASDZATSDASDARSDASDASDASDASDZTASZDTAXAYXFASTDRASRZWUEWERZWERZ

I would like split this data.frame into two columns.One column for names of the sequence and the other column for the respective sequences. 我想将这个data.frame分成两列。一列用于序列的名称,另一列用于各自的序列。

I created a data.frame and stored the names of sequences in one column but when I tried store the corresponding sequences in another column, it throwed me an error saying that replacement has 55 rows and data has 436 rows. 我创建了一个data.frame并将序列的名称存储在一列中,但是当我尝试将相应的序列存储在另一列时,它给我一个错误,说替换有55行,数据有436行。

The following code I tried and it gave me an error as follows: 我试过以下代码,它给了我一个错误如下:

new_DF=NULL
new_DF$names=as.data.frame(names(fasta_seq))
new_DF$sequences=as.data.frame(fasta_seq)

How can I achieve this using R. kindly guide me. 我怎样才能用R.实现这一点。请指导我。

Try 尝试

lines <- readLines('deena.fasta')
indx <- grepl('>', lines)
Sequence <- tapply(seq_along(indx),cumsum(indx), FUN=function(x) 
            paste(lines[tail(x,-1)], collapse=""))
d1 <- data.frame(names=lines[indx], Sequence, stringsAsFactors=FALSE)
head(d1,2)
#           names
#1 >tm_sd_1256_2_1
#2 >tm_sd_5672_1_2
                                                                           #                         Sequence
# 1                                              MJAKDHRZTASDJASJDKASJDURUJDFLSDJFSDIFJKSDFKSJDFLJSDLFDASDJASDJ
# 2 AIZZTQBCSKLKDSHDADBCMSJHKQUWIRJHJJKKDLJSGDHASGDZGDHGHAGSDZASDASDVASGASDHGCAHGSSADASDA[sample.fasta file][1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM