简体   繁体   中英

Python BeautifulSoup doesn't works on URL

I'm happy to join Stack Overflow :) First time i don't find an answer at my problem :)

I would like to scrap "meta description" on url list (in a SQL data base).

When I started my script, it gets "killed" without any error. It gets killed reading the 11th URL.

I made some tests, and I identified an URL : " http://www.les-calories.com/famille-4.html "

So i made this test, reducing my code at minimum :

# encoding=utf8 
from bs4 import BeautifulSoup
import urllib
html = urllib.urlopen(" http://www.les-calories.com/famille-4.html").read()
soup = BeautifulSoup(html)

And this code gets "killed" by the shell.

screen

I don't understand why...

Thank you for your help :)

It could be that you've not specified the parser in which case do the following.

soup = BeautifulSoup(html, "html.parser")

However, I think what is more likely is that there was just too much information in the HTML page. What I'd do is use the python-requests package, and in the GET request, I'd set stream to True . Like so:

>>> import requests
>>> resp = requests.get("http://www.les-calories.com/famille-4.html", stream=True)
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(resp.text, "html.parser")
>>> soup.find("a")
<a href="http://www.fitadium.com/79-seche-et-definition-musculaire" target="_blank"><img border="0" height="60px" src="h
ttp://www.les-calories.com/images/234x60_pack-minceur-brule-graisses.gif" width="234px"/></a>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM