如何從 html 內部的 div 標簽中獲取某個表標簽？

Question

我正在嘗試從網站http://www.o1vsk.lv/index.php/stundu-izmainas獲取表格信息。 html 我需要提取的網頁內容

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://www.o1vsk.lv/index.php/stundu-izmainas").read()

rows=[]
soup=BeautifulSoup(html,"html.parser")
box = soup.find('div', {'class': 'DRight'})

該程序獲取頁面的所有內容，而我只需要一個文本格式的小表格，例如：

我需要以文本格式 6.d 獲取的表格

...
...
...
...
...
...
...
...

Answer 1

抱歉，由於我的聲譽 < 50，我還不能發表評論

這是我給你的解決方案。

找到所有table標簽，它將返回 HTML 代碼

table = box.findAll("table")

將 HTML 轉換為 Pandas DataFrame (df)。 為什么索引 = 1？ 因為你想要的表在索引 1

df = pd.read_html(str(table))[1]

最后，刪除Unnamed的列以僅獲取所需的列

df.loc[:, ~df.columns.str.match('Unnamed')]

這是完整的代碼：

from pandas import pd
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://www.o1vsk.lv/index.php/stundu-izmainas").read()

rows=[]
soup=BeautifulSoup(html,"html.parser")
box = soup.find('div', {'class': 'DRight'})

table = box.findAll("table")
df = pd.read_html(str(table))[1]

df.loc[:, ~df.columns.str.match('Unnamed')]

如果這對您有幫助，請點贊:) 謝謝

如何從 html 內部的 div 標簽中獲取某個表標簽？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-08-10 15:45:35

如何從 html 內部的 div 標簽中獲取某個表標簽？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-08-10 15:45:35

解決方案1
1 已采納 2022-08-10 15:45:35