使用 urllib 和 BeautifulSoup 通过 Python 从网络检索信息

Question

I can get the html page using urllib, and use BeautifulSoup to parse the html page, and it looks like that I have to generate file to be read from BeautifulSoup.我可以使用urllib获取html页面，并使用BeautifulSoup解析html页面，看起来我必须生成要从BeautifulSoup读取的文件。

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

Is there a way to call BeautifulSoup without generating file from urllib?有没有办法在不从 urllib 生成文件的情况下调用 BeautifulSoup？

Answer 1

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

No file writing needed: Just pass in the HTML string.无需写入文件：只需传入 HTML 字符串即可。 You can also pass the object returned from urlopen directly:也可以直接传递urlopen返回的对象：

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)

Answer 2

You could open the url, download the html, and make it parse-able in one shot with gazpacho :您可以打开 url，下载 html，然后使用gazpacho一次性解析它：

from gazpacho import Soup
soup = Soup.get("https://www.example.com/")

使用 urllib 和 BeautifulSoup 通过 Python 从网络检索信息

问题描述

2 个解决方案

解决方案1
20 已采纳 2010-04-15 16:36:10

解决方案2
0 2020-10-09 23:28:13

使用 urllib 和 BeautifulSoup 通过 Python 从网络检索信息

问题描述

2 个解决方案

解决方案1 20 已采纳 2010-04-15 16:36:10

解决方案2 0 2020-10-09 23:28:13

解决方案1
20 已采纳 2010-04-15 16:36:10

解决方案2
0 2020-10-09 23:28:13