简体   繁体   中英

Index Error when i run multiple process

This script throws index error

from urlparse import urlparse
from multiprocessing.pool import Pool 
import re
import urllib2 

def btl_test(url):                                                                                                                                                                                                          
    page = urllib2.urlopen(url).read()
    page1 =  (re.findall(r'<title>(.*?)<\/title>',page)[0])
    return page1

url = ["http://google.com","http://example.com","http://yahoo.com","http://linkedin.com","http://facebook.com","http://orkut.com","http://oosing.com","http://pinterets.com"]

nprocs = 100 # nprocs is the number of processes to run
ParsePool = Pool(nprocs)
ParsedURLS = ParsePool.map(btl_test,url)
print ParsedURLS

Output:

Traceback (most recent call last):
  File "multithread1.py", line 15, in <module>
    ParsedURLS = ParsePool.map(btl_test,url)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
IndexError: list index out of range

Above is error message

Where is the problem occured and what is the solution?

There is chance that a url does not have title tag

So convert from this to

def btl_test(url):                                                                                                             
    page = urllib2.urlopen(url).read()
    page1 =  (re.findall(r'<title>(.*?)<\/title>',page)[0])
    return page1

This

def btl_test(url):                                                               
    page = urllib2.urlopen(url).read()
    page1 =  re.findall(r'<title>(.*?)<\/title>',page)
    return (page1[0]) if len(page1)>0 else "None"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM