使用 BeautifulSoup 從 Python 中的 HTML 中提取未列出的元素

Question

我有一些 HTML 文件。 我想從這里提取一個無序列表。 我有這個無序列表的類名。 我正在嘗試以下代碼：

soup =BeautifulSoup(HTML(open('dtaa.html').read()).__html__())
soup.find("ul",{"class":"name of class"})

dtaa.html 是我的文件這沒有給我任何東西。 這個無序列表在 2 個分區內。 也許這就是問題所在。 提前致謝

Answer 1

您可以像這樣閱讀 HTML 文件：

with open("dtaa.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    
soup.find("ul", attrs={"class":"name of class"})

您也可以嘗試另一個解析器，例如：

soup = BeautifulSoup(fp, "html5lib")

文檔：