I'm trying to download DNA sequence data from NCBI using entrez_fetch
. With the following code, I perform a search for the IDs of the sequences I need with entrez_search
, and then I attempt to download the sequence data in FASTA format:
library(rentrez)
#Search for sequence ids
search <- entrez_search(db = "biosample",
term = "Escherichia coli[Organism] AND geo_loc_name=USA:WA[attr]",
retmax = 9999, use_history = T)
search$ids
length(search$ids)
search$web_history
#Download sequence data
ecoli_fasta <- entrez_fetch(db = "nuccore",
web_history = search$web_history,
rettype = "fasta")
When I do this, I get the following error:
Error: HTTP failure: 400
Cannot+retrieve+query+from+history
I don't understand what this means and Googling hasn't led me to an answer.
I tried using a different package ( ape
) and the function read.GenBank
to download the sequences as an alternative, but this method only managed to download about 1000 of the 12000 sequences I needed. I would like the use entrez_fetch
if possible - does anyone have any insight for me?
This may be a starter.
Also be aware that queries to genome databases can return massive amounts of data, so be sure to limit your queries.
library(rentrez)
search <- entrez_search(db="nuccore",
term="Escherichia coli[Organism]",
use_history = T)
cat(entrez_fetch(db="nuccore",
web_history=search$web_history, rettype="fasta", retstart=24, retmax=100))
>pdb|7QQ3|I Chain I, 23S ribosomal RNA
NGTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCG
ATAAGCGTCGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTC
GACACACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCC
CGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAAT
CAGTGTGTGTGTTAGTGGAAGCGTCTGGAAAGGCGCGCGATACAGGGTGACAGCCCCGTACACAAAAATG
CACATGCTGTGAGCTCGATGAGTAGGGCGGGACACGTGGTATCCTGTCTGAATATGGGGGGACCATCCTC
CAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAACCCCGGC
...
Use a loop to cycle through sequences, eg
for(i in seq(1, 300, 100)){
cat(entrez_fetch(db="nuccore",
web_history=search$web_history, rettype="fasta", retstart=i, retmax=100))
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.