如何从 html 内部的 div 标签中获取某个表标签？

Question

I'm trying to get the table information from the web-site http://www.o1vsk.lv/index.php/stundu-izmainas .我正在尝试从网站http://www.o1vsk.lv/index.php/stundu-izmainas获取表格信息。 html content of the web-page i need to extract html 我需要提取的网页内容

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://www.o1vsk.lv/index.php/stundu-izmainas").read()

rows=[]
soup=BeautifulSoup(html,"html.parser")
box = soup.find('div', {'class': 'DRight'})

This program gets all content of the page, while I need only one little table in the text format like:该程序获取页面的所有内容，而我只需要一个文本格式的小表格，例如：

the table i need to get in the text format 6.d我需要以文本格式 6.d 获取的表格

... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...

Answer 1

sorry I cannot comment yet due to my reputation is < 50抱歉，由于我的声誉 < 50，我还不能发表评论

Here is my solution for you.这是我给你的解决方案。

Find all the table tag and it will return HTML code找到所有table标签，它将返回 HTML 代码

table = box.findAll("table")

Convert the HTML to Pandas DataFrame (df).将 HTML 转换为 Pandas DataFrame (df)。 Why index = 1?为什么索引 = 1？ Because the table you want in the index 1因为你想要的表在索引 1

df = pd.read_html(str(table))[1]

Lastly, remove Unnamed column to get only the needed column最后，删除Unnamed的列以仅获取所需的列

df.loc[:, ~df.columns.str.match('Unnamed')]

Here is the full code:这是完整的代码：

from pandas import pd
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://www.o1vsk.lv/index.php/stundu-izmainas").read()

rows=[]
soup=BeautifulSoup(html,"html.parser")
box = soup.find('div', {'class': 'DRight'})

table = box.findAll("table")
df = pd.read_html(str(table))[1]

df.loc[:, ~df.columns.str.match('Unnamed')]

please upvote if this help you:) thanks如果这对您有帮助，请点赞:) 谢谢

如何从 html 内部的 div 标签中获取某个表标签？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-10 15:45:35

如何从 html 内部的 div 标签中获取某个表标签？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-10 15:45:35

解决方案1
1 已采纳 2022-08-10 15:45:35