[英]How to a fasta file with multiple sequences in it within R
I have been extracting fasta files from an online database (uniprot), by obtaining their accession numbers using the following library:我一直在从在线数据库(uniprot)中提取 fasta 文件,方法是使用以下库获取它们的入藏号:
install.packages("protr")
library("protr")
IDs <- c( "xxxx","AAAAA")
Proteins_IDs <- getUniProt(IDs)
#Test for this
Proteins_IDs
This works perfectly to grab me the sequences of interest in a fasta format that I can then write.这非常适合以我可以编写的 fasta 格式获取我感兴趣的序列。 The problem that I have is with writing the multiple sequences into ONE individual merged fasta file.
我遇到的问题是将多个序列写入一个单独的合并 fasta 文件。 Currently, with I have determined a method of writing individual fasta files for each individual sequence that I grabbed using the code below:
目前,我已经确定了一种为我使用下面的代码抓取的每个单独序列编写单独的 fasta 文件的方法:
x <- for(i in 1:length(Proteins_IDs)){
write.fasta(Proteins_IDs[i], names=Proteins_IDs[i], file.out=paste(Proteins_IDs[i], ".fasta", sep=""))
}
The problem is this creates individual fasta files for each rather than a combined larger file containing multiple sequences.问题是这会为每个文件创建单独的 fasta 文件,而不是包含多个序列的组合较大文件。
Use the functions of the Biostrings
package from Bioconductor when dealing with fasta files, and in general any kind of "biological" (DNA,RNA,AA) strings, in R:在处理 fasta 文件时,使用来自Bioconductor的
Biostrings
package 的功能,以及通常在 R 中的任何类型的“生物”(DNA、RNA、AA)字符串:
library(protr)
IDs <- c("P00750", "P00751", "P00752")
Proteins_IDs <- getUniProt(IDs)
names(Proteins_IDs) <- IDs
library(Biostrings)
multifasta <- Biostrings::AAStringSet(unlist(Proteins_IDs))
Biostrings::writeXStringSet(multifasta, "your_multifasta.fa")
Output in the file:文件中的Output:
>P00750
MDAMKRGLCCVLLLCGAVFVSPSQEIHARFRRGARSYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHS
VPVKSCSEPRCFNGGTCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECTNWNSSALAQKP
YSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDCYFGNGSAYRGTHSLTESGASCLPWN
SMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAKPWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIA
SHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHKEFDD
DTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQH
LLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNM
RP
>P00751
MGSNLSPQLCLMPFILGLLSGGVTTTPWSLARPQGSCSLEGVEIKGGSFRLLQEGQALEYVCPSGFYPYPVQTRTCRSTG
SWSTLKTQDQKTVRKAECRAIHCPRPHDFENGEYWPRSPYYNVSDEISFHCYDGYTLRGSANRTCQVNGRWSGQTAICDN
GAGYCSNPGIPIGTRKVGSQYRLEDSVTYHCSRGLTLRGSQRRTCQEGGSWSGTEPSCQDSFMYDTPQEVAEAFLSSLTE
TIEGVDAEDGHGPGEQQKRKIVLDPSGSMNIYLVLDGSDSIGASNFTGAKKCLVNLIEKVASYGVKPRYGLVTYATYPKI
WVKVSEADSSNADWVTKQLNEINYEDHKLKSGTNTKKALQAVYSMMSWPDDVPPEGWNRTRHVIILMTDGLHNMGGDPIT
VIDEIRDLLYIGKDRKNPREDYLDVYVFGVGPLVNQVNINALASKKDNEQHVFKVKDMENLEDVFYQMIDESQSLSLCGM
VWEHRKGTDYHKQPWQAKISVIRPSKGHESCMGAVVSEYFVLTAAHCFTVDDKEHSIKVSVGGEKRDLEIEVVLFHPNYN
INGKKEAGIPEFYDYDVALIKLKNKLKYGQTIRPICLPCTEGTTRALRLPPTTTCQQQKEELLPAQDIKALFVSEEEKKL
TRKEVYIKNGDKKGSCERDAQYAPGYDKVKDISEVVTPRFLCTGGVSPYADPNTCRGDSGGPLIVHKRSRFIQVGVISWG
VVDVCKNQKRQKQVPAHARDFHINLFQVLPWLKEKLQDEDLGFL
>P00752
APPIQSRIIGGRECEKNSHPWQVAIYHYSSFQCGGVLVNPKWVLTAAHCKNDNYEVWLGRHNLFENENTAQFFGVTADFP
HPGFNLSLLKXHTKADGKDYSHDLMLLRLQSPAKITDAVKVLELPTQEPELGSTCEASGWGSIEPGPDBFEFPDEIQCVQ
LTLLQNTFCABAHPBKVTESMLCAGYLPGGKDTCMGDSGGPLICNGMWQGITSWGHTPCGSANKPSIYTKLIFYLDWIND
TITENP
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.