使用 BeautifulSoup 從本地保存的 html 文件中提取原始 html

Question

BeautifulSoup 相對較新。 試圖從本地保存的 html 文件中獲取原始 html。 我環顧四周，發現我可能應該為此使用 Beautiful Soup。 雖然當我這樣做時：

from bs4 import BeautifulSoup
url = r"C:\example.html"
soup = BeautifulSoup(url, "html.parser")
text = soup.get_text()
print (text)

打印出一個空字符串。 我想我錯過了一些步驟。 任何朝着正確方向的推動將不勝感激。

Answer 1

BeautifulSoup的第一個參數是一個實際的 HTML 字符串，而不是一個 URL。 打開文件，讀取其內容，並將其傳入。

Answer 2

談到上一個答案，有兩種方法可以打開 HTML 文件：

1.

with open("example.html") as fp:
    soup = BeautifulSoup(fp)

2.

soup = BeautifulSoup(open("example.html"))