Extract specific Fasta data from NCBI web page

Question

i'm trying to extract just the fasta format of the "NM_213035.1", and the output is all the page except the fasta.
the inspect part of the fasta is existed in <pre tag.

the code:

import bs4
import sys
from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.ncbi.nlm.nih.gov/nuccore/{FASTA}?report=fasta".format(FASTA="NM_213035.1"))
url.raise_for_status()
ncbi = bs4.BeautifulSoup(url.text, "html.parser")

filename = ncbi.title.text
with open(filename, 'w+') as f:
    for i in ncbi.select('p'):
        f.write(i.getText())

the output:

Warning: The NCBI web site requires JavaScript to function. more... Download features.Download gene features.NCBI Reference Sequence: NM_213035.1

GenBank Graphics

Whole sequence

Selected region

from:

to:

Show reverse complement

Show gap features Your browsing activity is empty.Activity recording is turned off. Turn recording back on

National Center for Biotechnology Information, US National Library of Medicine

8600 Rockville Pike, Bethesda MD, 20894 USA

Answer 1

You are not using the correct URL to fetch FASTA files via the REST API. As @Ghoti pointed out, the correct URLs are described here: https://www.ncbi.nlm.nih.gov/books/NBK25497/

For you specific problem this would be:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NM_213035.1&rettype=fasta&retmode=text

If you are using Python, you could use Biotite for this task, a package I am developing: https://www.biotite-python.org/apidoc/biotite.database.entrez.fetch.html

Extract specific Fasta data from NCBI web page

Question

1 answers

solution1
0 2021-01-19 11:48:58

Extract specific Fasta data from NCBI web page

Question

1 answers

solution1 0 2021-01-19 11:48:58

solution1
0 2021-01-19 11:48:58