[英]How to apply a function to multiple FASTA sequences within a text file?
[英]How to a fasta file with multiple sequences in it within R
我一直在從在線數據庫(uniprot)中提取 fasta 文件,方法是使用以下庫獲取它們的入藏號:
install.packages("protr")
library("protr")
IDs <- c( "xxxx","AAAAA")
Proteins_IDs <- getUniProt(IDs)
#Test for this
Proteins_IDs
這非常適合以我可以編寫的 fasta 格式獲取我感興趣的序列。 我遇到的問題是將多個序列寫入一個單獨的合並 fasta 文件。 目前,我已經確定了一種為我使用下面的代碼抓取的每個單獨序列編寫單獨的 fasta 文件的方法:
x <- for(i in 1:length(Proteins_IDs)){
write.fasta(Proteins_IDs[i], names=Proteins_IDs[i], file.out=paste(Proteins_IDs[i], ".fasta", sep=""))
}
問題是這會為每個文件創建單獨的 fasta 文件,而不是包含多個序列的組合較大文件。
在處理 fasta 文件時,使用來自Bioconductor的Biostrings
package 的功能,以及通常在 R 中的任何類型的“生物”(DNA、RNA、AA)字符串:
library(protr)
IDs <- c("P00750", "P00751", "P00752")
Proteins_IDs <- getUniProt(IDs)
names(Proteins_IDs) <- IDs
library(Biostrings)
multifasta <- Biostrings::AAStringSet(unlist(Proteins_IDs))
Biostrings::writeXStringSet(multifasta, "your_multifasta.fa")
文件中的Output:
>P00750
MDAMKRGLCCVLLLCGAVFVSPSQEIHARFRRGARSYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCNSGRAQCHS
VPVKSCSEPRCFNGGTCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECTNWNSSALAQKP
YSGRRPDAIRLGLGNHNYCRNPDRDSKPWCYVFKAGKYSSEFCSTPACSEGNSDCYFGNGSAYRGTHSLTESGASCLPWN
SMILIGKVYTAQNPSAQALGLGKHNYCRNPDGDAKPWCHVLKNRRLTWEYCDVPSCSTCGLRQYSQPQFRIKGGLFADIA
SHPWQAAIFAKHRRSPGERFLCGGILISSCWILSAAHCFQERFPPHHLTVILGRTYRVVPGEEEQKFEVEKYIVHKEFDD
DTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSPFYSERLKEAHVRLYPSSRCTSQH
LLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWGLGCGQKDVPGVYTKVTNYLDWIRDNM
RP
>P00751
MGSNLSPQLCLMPFILGLLSGGVTTTPWSLARPQGSCSLEGVEIKGGSFRLLQEGQALEYVCPSGFYPYPVQTRTCRSTG
SWSTLKTQDQKTVRKAECRAIHCPRPHDFENGEYWPRSPYYNVSDEISFHCYDGYTLRGSANRTCQVNGRWSGQTAICDN
GAGYCSNPGIPIGTRKVGSQYRLEDSVTYHCSRGLTLRGSQRRTCQEGGSWSGTEPSCQDSFMYDTPQEVAEAFLSSLTE
TIEGVDAEDGHGPGEQQKRKIVLDPSGSMNIYLVLDGSDSIGASNFTGAKKCLVNLIEKVASYGVKPRYGLVTYATYPKI
WVKVSEADSSNADWVTKQLNEINYEDHKLKSGTNTKKALQAVYSMMSWPDDVPPEGWNRTRHVIILMTDGLHNMGGDPIT
VIDEIRDLLYIGKDRKNPREDYLDVYVFGVGPLVNQVNINALASKKDNEQHVFKVKDMENLEDVFYQMIDESQSLSLCGM
VWEHRKGTDYHKQPWQAKISVIRPSKGHESCMGAVVSEYFVLTAAHCFTVDDKEHSIKVSVGGEKRDLEIEVVLFHPNYN
INGKKEAGIPEFYDYDVALIKLKNKLKYGQTIRPICLPCTEGTTRALRLPPTTTCQQQKEELLPAQDIKALFVSEEEKKL
TRKEVYIKNGDKKGSCERDAQYAPGYDKVKDISEVVTPRFLCTGGVSPYADPNTCRGDSGGPLIVHKRSRFIQVGVISWG
VVDVCKNQKRQKQVPAHARDFHINLFQVLPWLKEKLQDEDLGFL
>P00752
APPIQSRIIGGRECEKNSHPWQVAIYHYSSFQCGGVLVNPKWVLTAAHCKNDNYEVWLGRHNLFENENTAQFFGVTADFP
HPGFNLSLLKXHTKADGKDYSHDLMLLRLQSPAKITDAVKVLELPTQEPELGSTCEASGWGSIEPGPDBFEFPDEIQCVQ
LTLLQNTFCABAHPBKVTESMLCAGYLPGGKDTCMGDSGGPLICNGMWQGITSWGHTPCGSANKPSIYTKLIFYLDWIND
TITENP
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.