简体   繁体   中英

How to get the first url from for in loop ( Python )

I try to get the first link from dataInfo from the loop. This script make it possible for me to get image links and to download the image files. I want only the first image not all, thats my problem.

# Get results using JSON
results = simplejson.load(response)
data = results['responseData']
dataInfo = data['results']

# Iterate for each result and get unescaped url
for myUrl in dataInfo:
    count = count + 1
    print myUrl['unescapedUrl']

    myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

here is the whole source code

  import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson

# Define search term
searchTerm = "intel i7"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()

# Set count to 0
count= 0

for i in range(0,10):
    # Notice that the start changes for each iteration in order to request a new set of images for each loop
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP')
    print url
    request = urllib2.Request(url, None, {'Referer': 'testing'})
    response = urllib2.urlopen(request)

    # Get results using JSON
    results = simplejson.load(response)
    data = results['responseData']
    dataInfo = data['results']

    # Iterate for each result and get unescaped url
    for myUrl in dataInfo:
        count = count + 1
        print myUrl['unescapedUrl']

        myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

    # Sleep for one second to prevent IP blocking from Google
    time.sleep(1)

Once you have your image, try break to escape after the first iteration of your for loop. For example,

for myUrl in dataInfo:
    count = count + 1
    print myUrl['unescapedUrl']
    myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')
    break

You can either do for myUrl in dataInfo[:1]:

or break after your first successfully downloaded image:

for myUrl in dataInfo:
  count = count + 1
  print myUrl['unescapedUrl']
  try:
    myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')
    break
  except:
    pass

Just to be clear, dataInfo here contains a list of URLs, and you only want the first one. Is that correct?

If so, rather than looping through dataInfo, you should be able to simply refer to the first (0th) index.

Rather than

for myUrl in dataInfo:
    count = count + 1
    print myUrl['unescapedUrl']

    myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

You should then be able to use

myopener.retrieve(dataInfo[0]['unescapedUrl'],'0.jpg')
dataInfo[0]['unescapedUrl'] 

应该是第一个网址

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM