简体   繁体   English

如何在 FASTA 文件中找到基因的第一个碱基的编号?

[英]How can I find the number of the first base of a gene in a FASTA file?

In order to manually modify a .gff file I have, I need to find the start position of my gene in the FASTA-formatted genome of my animal (ie what # base is it in the sequence?).为了手动修改我拥有的 .gff 文件,我需要在我的动物的 FASTA 格式的基因组中找到我的基因的起始位置(即它在序列中的 # 碱基是什么?)。 I have the sequence of this gene.我有这个基因的序列。

How do I do this as easily as possible (this is not an animal whose genome is readily available on the internet)?我如何尽可能轻松地做到这一点(这不是一种可以在互联网上轻松获得基因组的动物)?

What I have: the genome, in FASTA format;我所拥有的:基因组,FASTA 格式; a GFF file containing an annotation for this organism's genome (which needs to be sorely updated);包含该生物基因组注释的 GFF 文件(需要非常更新); the sequence of this gene.这个基因的序列。

Thank you!谢谢!

If you know that the gene sequence is identical to that in the reference, do (using python)如果您知道基因序列与参考中的相同,请执行(使用 python)

import re
match = re.search(your_gene_seq, your_genome_seq)
if match:
    gene_start = match.start()
else:
    print("no match")

Otherwise, you will need to do a pairwise alignment of your gene to the reference否则,您将需要将您的基因与参考进行成对比对

using Biopython:使用Biopython:

python -m pip install biopython

from Bio import pairwise2
# alignment scores: match = 5, mismatch = -4, gap open = -2, gap extend = -0.5
alignment = pairwise2.align.globalms(your_gene_seq, your_genome_seq, 5, -4, -2, -0.5)[0]
gene_start = alignment[3]

to update the gff更新 gff

use biopython使用生物蟒蛇

https://biopython.org/wiki/GFF_Parsing https://biopython.org/wiki/GFF_Parsing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 fasta 文件中挤出 Acc(基因 ID 或登录号) - extrude Acc(Gene ID or accession number) from a fasta file 如何通过基因 ID 从 Fasta 文件中检索序列 - How to retrieve sequences from a Fasta file by gene ID 如何使用 Python3 中的 biopython 库从 TB 测序 fasta 文件中找到反向基因(如 pncA)的突变? - How to find Mutations for a reverse oriented gene(like pncA) from TB sequencing fasta file using biopython library in Python3? 从 FASTA 文件中提取基因序列? - Extracting gene sequences from FASTA File? 我如何在 fasta 文件中输入 grep 模式,然后将所有这些对应序列打印到 fasta 文件中? - How can I grep patterns in a fasta file, then print all those cooresponding sequences to a fasta file? 如何消除fasta文件中的重复序列 - How can i eliminate duplicated sequences in fasta file 如何在FASTA文件中生成特定模式的直方图? - How can I make a histogram of occurences of specific patterns in a FASTA file? 输入期间如何从FASTA格式文件中删除第一行? - How do I remove the first line from a FASTA format file during input? 如何找到可以用FASTA而不是BLAST找到的查询,反之亦然? - How to find a query that you can find with FASTA but not with BLAST and vice versa? 如何使用 R stringr 只留下基因名称? - How can I use R stringr to leave only the gene name?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM