简体   繁体   中英

Loop for iterating trhough variable URL (Python)

Python newby here.

I am developing a simple sequence search program for DNA sequences. The main idea is to get from the NCBI database the different sequences from an specific genome and start-end points. So far, I am able to do a simple search for one genome and one specific position: `

  import urllib

  genome="NC_009089.1"
  start="359055"
  end= "359070"

  link = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=%s&rettype=fasta&seq_start=%s&seq_stop=%s" % (genome, start, end)


  f = urllib.urlopen(link)
  myfile = f.read()
  print(myfile).splitlines()[1]
                              `

And this is the output I'm getting (the sequence in that position):

AGTAAAACGGTTTCCT

Now,I would like to find several sequences from different genomes and with different start-end points at the same time, returning all the sequences that were found. I´ve tried to import the data as a csv with the genomes in the first column, starts in the second and ends in the thirds, and then do a for loop with the open file, but since I´m not familiar with changing variables in URLs, I don´t know how to proceed.

Sorry if this is a naive question. Any help would be appreciated.

If you already have all the parameters in a file, you can iterate over that data and make requests like this (I use lists, because you don't show your code how you read your data from the file):

import urllib
url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi'

for genome, start, end in zip(genome_list, start_list, end_list):
    data = {
        'db': 'nuccore',
        'rettype': 'fasta',
        'genome': genome,
        'start': start,
        'end': end,
    }
    f = urllib.urlopen(url, data)

By passgin a dict with the query params, urlopen() takes care of encoding all the parameters properly (with = and & ).


If urllib is a bit complicated, you can try the python requests library, which is much nicer to work with in my experience (but it is a third-party lib, not built-in).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM