使用python从桌面解析HTML

Question

让我们说我需要从这个网站收取信息： http ：//www.smv.gob.pe/Frm_ValorCuotaDetalle_V2.aspx？in_ac_pre_ope = A＆in_ad_fecha = 31/01/2017

但是当我遇到代理问题时，我所做的就是从网上复制源代码并将其粘贴到名为test222.html的记事本文件中。

我想用漂亮的汤来阅读它以便操纵它但我真的不知道该怎么做。 test222.html文档在我的桌面上。 我现在拥有的所有代码都是......

from bs4 import BeautifulSoup

web_parsed = 'C:/Users/Desktop/test222.html'

soup = BeautifulSoup(web_parsed, 'html.parser')

print soup

提前致谢

Answer 1

BeautifulSoup需要一个HTML字符串，因此您需要先读取该文件：

with open(r'C:/Users/Desktop/test222.html') as f:
    html = f.read()

soup = BeautifulSoup(html , 'html.parser')
print soup