简体   繁体   English

如何从R中的genbank文件中提取基因

[英]how to extract genes from genbank file in R

I ask this question because I don't really know how to do it.我问这个问题是因为我真的不知道该怎么做。

I have a genome in a gb format (YJ016_I.gb) so I want to import in R and then export all the genes in nucleotide format, or just take one of the sequence using the name of the gene.我有一个 gb 格式(YJ016_I.gb)的基因组,所以我想在 R 中导入,然后以核苷酸格式导出所有基因,或者只使用基因名称获取其中一个序列。

library(genbankr)
library(stringr)
library(purrr)




gb <- genbankr::readGenBank("YJ016_I.gb")
GENES <- GenomicFeatures::genes(gb)
GenesDF <- data.frame(GENES)

  seqnames start  end width strand type gene     locus_tag old_locus_tag loctype pseudo gene_synonym       gene_id
1        I   353  787   435      - gene mioC BJE04_RS00275  VV0001, ....  normal  FALSE           NA BJE04_RS00275
2        I   835 2196  1362      - gene mnmE BJE04_RS00280  VV0002, ....  normal  FALSE           NA BJE04_RS00280
3        I  2314 3933  1620      - gene yidC BJE04_RS00285  VV0003, ....  normal  FALSE           NA BJE04_RS00285
4        I  3936 4193   258      - gene yidD BJE04_RS23545                normal  FALSE           NA BJE04_RS23545     
5        I  4160 4516   357      - gene rnpA BJE04_RS00290  VV0004, ....  normal  FALSE           NA BJE04_RS00290
6        I  4530 4670   141      - gene rpmH BJE04_RS00295  VV0005, ....  normal  FALSE           NA BJE04_RS00295

if I want to extract the sequence of BJE04_RS00275 in locus_tag, or in the other way export all genes of the genbank file如果我想在locus_tag中提取BJE04_RS00275的序列,或者以其他方式导出genbank文件的所有基因

I mean我是说

>BJE04_RS00275
aatgc
>BJE04_RS00280
ggcta
>BJE04_5000
atggcaa

how can I do it with R or if you have any solution in any language o program !!!我怎么能用 R 或者如果你有任何语言或程序的任何解决方案!

Thanks谢谢

I do not have access to your specific file, so I am using the example file provided by the genbankr package.我无权访问您的特定文件,因此我使用的是genbankr包提供的示例文件。 You will need Biostrings to write the sequence as fasta file.您将需要Biostrings将序列写入 fasta 文件。 To only write the sequence(s) of a particular locus_tag, just subset GENES to that particular one (eg subset(GENES, locus_tag == "BJE04_RS00275") in your example).要仅写入特定 locus_tag 的序列,只需将 GENES 子集添加到该特定序列(例如,您的示例中的subset(GENES, locus_tag == "BJE04_RS00275") )。

library(genbankr)
library(GenomicFeatures)
library(Biostrings)

gb <- readGenBank(system.file("sample.gbk", package="genbankr"))
GENES <- genes(gb)
res <- getSeq(gb@sequence, setNames(GENES, GENES$gene_id))
writeXStringSet(res, "YJ016_I_genes.fa")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM