简体   繁体   English

美丽汤返回空集

[英]Beautiful Soup returning empty set

Beautiful Soup is working fine on a local machine, but not working on another server. Beautiful Soup在本地计算机上可以正常工作,但不能在另一台服务器上工作。

import urllib2
import bs4

url = urllib2.urlopen("http://www.google.com")
html = url.read()
soup = bs4.BeautifulSoup(html)

print soup

Printing Html outputs google's webpage correctly. 打印HTML可以正确输出google的网页。 Printing soup returns empty. 打印汤返回空。

On local it works fine, however on this redhat machine it returns empty. 在本地,它工作正常,但是在此redhat机器上,它返回空值。

Is this something to do with installing a parser? 这与安装解析器有关吗? I looked up some other possible solutions and they mentioned installing a parser, but no luck thus far. 我查找了其他一些可能的解决方案,他们提到安装解析器,但到目前为止还算不上好运。

This solution Beautiful Soup returning nothing doesn't apply to my problem 此解决方案“不返回任何内容的美丽汤”不适用于我的问题

Just to prove you that your case is unique and there is nothing to do with Redhat. 只是为了向您证明您的案子是独一无二的,与Redhat无关。

I kicked out an micro Redhat instance from AWS and here is the complete process from SSH into that brand new redhat machines. 我从AWS踢出了一个微型Redhat实例,这是从SSH到该全新Redhat计算机的完整过程。 在此处输入图片说明

(1) Here i installed beautifulsoup4 on the new machine: (1)在这里我在新机器上安装了beautifulsoup4:

$ ssh -i key.pem ec2-user@awsip
The authenticity of host 'awsip' cant be established.
RSA key fingerprint is ....
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'awsip' (RSA) to the list of known hosts.
[ec2-user@awsip ~]$ sudo easy_install beautifulsoup4
Searching for beautifulsoup4
Reading http://pypi.python.org/simple/beautifulsoup4/
...
Installed /usr/lib/python2.6/site-packages/beautifulsoup4-4.3.2-py2.6.egg
Processing dependencies for beautifulsoup4
Finished processing dependencies for beautifulsoup4

(2) I opened up python and get the output from google both in html and soup (2)我打开python并从htmlsoup获取来自google的输出

[ec2-user@awsip ~]$ python
Python 2.6.6 (r266:84292, May 27 2013, 05:35:12)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> html = urllib2.urlopen("http://www.google.com").read()
>>> soup = BeautifulSoup(html)
>>> print html[:100]
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="Search t
>>> print soup.prettify()[:100]
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage">
 <head>
  <meta content="Se

To debug it is the fault of urllib2 or bs4: try run this code: 要调试它是urllib2或bs4的错误:请尝试运行以下代码:

from bs4 import BeautifulSoup

html = """
<html>
<head>
</head>
<body>
<div id="1">numberone</div>
<div id="2">numbertwo</div>
</body>
</html>
"""

print BeautifulSoup(html).find('div', {"id":"1"})

If you beautifulsoup is installed successfully, you will get the expected output like below: 如果您成功安装了beautifulsoup,将获得如下所示的预期输出:

<div id="1">numberone</div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM