简体   繁体   中英

Beautiful Soup returning empty set

Beautiful Soup is working fine on a local machine, but not working on another server.

import urllib2
import bs4

url = urllib2.urlopen("http://www.google.com")
html = url.read()
soup = bs4.BeautifulSoup(html)

print soup

Printing Html outputs google's webpage correctly. Printing soup returns empty.

On local it works fine, however on this redhat machine it returns empty.

Is this something to do with installing a parser? I looked up some other possible solutions and they mentioned installing a parser, but no luck thus far.

This solution Beautiful Soup returning nothing doesn't apply to my problem

Just to prove you that your case is unique and there is nothing to do with Redhat.

I kicked out an micro Redhat instance from AWS and here is the complete process from SSH into that brand new redhat machines. 在此处输入图片说明

(1) Here i installed beautifulsoup4 on the new machine:

$ ssh -i key.pem ec2-user@awsip
The authenticity of host 'awsip' cant be established.
RSA key fingerprint is ....
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'awsip' (RSA) to the list of known hosts.
[ec2-user@awsip ~]$ sudo easy_install beautifulsoup4
Searching for beautifulsoup4
Reading http://pypi.python.org/simple/beautifulsoup4/
...
Installed /usr/lib/python2.6/site-packages/beautifulsoup4-4.3.2-py2.6.egg
Processing dependencies for beautifulsoup4
Finished processing dependencies for beautifulsoup4

(2) I opened up python and get the output from google both in html and soup

[ec2-user@awsip ~]$ python
Python 2.6.6 (r266:84292, May 27 2013, 05:35:12)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> html = urllib2.urlopen("http://www.google.com").read()
>>> soup = BeautifulSoup(html)
>>> print html[:100]
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="Search t
>>> print soup.prettify()[:100]
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage">
 <head>
  <meta content="Se

To debug it is the fault of urllib2 or bs4: try run this code:

from bs4 import BeautifulSoup

html = """
<html>
<head>
</head>
<body>
<div id="1">numberone</div>
<div id="2">numbertwo</div>
</body>
</html>
"""

print BeautifulSoup(html).find('div', {"id":"1"})

If you beautifulsoup is installed successfully, you will get the expected output like below:

<div id="1">numberone</div>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM