简体   繁体   中英

How to a fasta file with multiple sequences in it within R

I have been extracting fasta files from an online database (uniprot), by obtaining their accession numbers using the following library:

    install.packages("protr")
    
    library("protr")


IDs <- c( "xxxx","AAAAA")

Proteins_IDs <- getUniProt(IDs)

#Test for this
Proteins_IDs

This works perfectly to grab me the sequences of interest in a fasta format that I can then write. The problem that I have is with writing the multiple sequences into ONE individual merged fasta file. Currently, with I have determined a method of writing individual fasta files for each individual sequence that I grabbed using the code below:

x <- for(i in 1:length(Proteins_IDs)){
  write.fasta(Proteins_IDs[i], names=Proteins_IDs[i], file.out=paste(Proteins_IDs[i], ".fasta", sep=""))
}

The problem is this creates individual fasta files for each rather than a combined larger file containing multiple sequences.

Use the functions of the Biostrings package from Bioconductor when dealing with fasta files, and in general any kind of "biological" (DNA,RNA,AA) strings, in R:

library(protr)
IDs <- c("P00750", "P00751", "P00752")
Proteins_IDs <- getUniProt(IDs)
names(Proteins_IDs) <- IDs

library(Biostrings)
multifasta <- Biostrings::AAStringSet(unlist(Proteins_IDs))

Biostrings::writeXStringSet(multifasta, "your_multifasta.fa")

Output in the file:

>P00750
MDAMKRGLCCVLLLCGAVFVSPSQEIHARFRRGARSYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHS
VPVKSCSEPRCFNGGTCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECTNWNSSALAQKP
YSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDCYFGNGSAYRGTHSLTESGASCLPWN
SMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAKPWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIA
SHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHKEFDD
DTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQH
LLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNM
RP
>P00751
MGSNLSPQLCLMPFILGLLSGGVTTTPWSLARPQGSCSLEGVEIKGGSFRLLQEGQALEYVCPSGFYPYPVQTRTCRSTG
SWSTLKTQDQKTVRKAECRAIHCPRPHDFENGEYWPRSPYYNVSDEISFHCYDGYTLRGSANRTCQVNGRWSGQTAICDN
GAGYCSNPGIPIGTRKVGSQYRLEDSVTYHCSRGLTLRGSQRRTCQEGGSWSGTEPSCQDSFMYDTPQEVAEAFLSSLTE
TIEGVDAEDGHGPGEQQKRKIVLDPSGSMNIYLVLDGSDSIGASNFTGAKKCLVNLIEKVASYGVKPRYGLVTYATYPKI
WVKVSEADSSNADWVTKQLNEINYEDHKLKSGTNTKKALQAVYSMMSWPDDVPPEGWNRTRHVIILMTDGLHNMGGDPIT
VIDEIRDLLYIGKDRKNPREDYLDVYVFGVGPLVNQVNINALASKKDNEQHVFKVKDMENLEDVFYQMIDESQSLSLCGM
VWEHRKGTDYHKQPWQAKISVIRPSKGHESCMGAVVSEYFVLTAAHCFTVDDKEHSIKVSVGGEKRDLEIEVVLFHPNYN
INGKKEAGIPEFYDYDVALIKLKNKLKYGQTIRPICLPCTEGTTRALRLPPTTTCQQQKEELLPAQDIKALFVSEEEKKL
TRKEVYIKNGDKKGSCERDAQYAPGYDKVKDISEVVTPRFLCTGGVSPYADPNTCRGDSGGPLIVHKRSRFIQVGVISWG
VVDVCKNQKRQKQVPAHARDFHINLFQVLPWLKEKLQDEDLGFL
>P00752
APPIQSRIIGGRECEKNSHPWQVAIYHYSSFQCGGVLVNPKWVLTAAHCKNDNYEVWLGRHNLFENENTAQFFGVTADFP
HPGFNLSLLKXHTKADGKDYSHDLMLLRLQSPAKITDAVKVLELPTQEPELGSTCEASGWGSIEPGPDBFEFPDEIQCVQ
LTLLQNTFCABAHPBKVTESMLCAGYLPGGKDTCMGDSGGPLICNGMWQGITSWGHTPCGSANKPSIYTKLIFYLDWIND
TITENP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM