Text extraction from multiple websites

Question

from bs4 import BeautifulSoup
import re
import urllib2
import urllib
list_open = open("weblist.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for url in line_in_list:
        Beautiful = urllib2.urlopen(url).read()
        beautiful
        soup = bs4.BeautifulSoup(beautiful)
        for news in soup:
                 print soup.getText()

The following code helps me to extract text from multiple websites (weblist.txt)

but when my weblist contains any link or website which don't open with this code it stops immediately and not check further links. Suppose if I have 10 links and second one is not open or is not able to parse it gives error and stops in that link without checking further links.I want that it should check each link from weblist (start to end ) and extract text from all those links which are genuine or able to parse.

Answer 1

Just add a try except statement like this:

for url in line_in_list:
    try:
        Beautiful = urllib2.urlopen(url).read()
        beautiful
        soup = bs4.BeautifulSoup(beautiful)
        for news in soup:
             print soup.getText()
    except Exception as e:
        #Error handling
        print(e)

Text extraction from multiple websites

Question

1 answers

solution1
0 ACCPTED 2018-09-16 05:09:37

Text extraction from multiple websites

Question

1 answers

solution1 0 ACCPTED 2018-09-16 05:09:37

solution1
0 ACCPTED 2018-09-16 05:09:37