使用 Beautiful Soup (Python) 从表中提取特定值

Question

我在 Stackoverflow 上环顾四周，大多数指南似乎都非常具体地从表中提取所有数据。 但是，我只需要提取一个，并且似乎无法从表中提取该特定值。

刮痧链接：

https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919

我希望从链接中的表格中提取“样式”值。

代码：

import bs4

styleData=[]

pagedata = requests.get("https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919") 
cleanpagedata = bs4.BeautifulSoup(pagedata.text, 'html.parser') 

table=cleanbyAddPD.find('div',{'id':'MainContent_ctl01_panView'})
style=table.findall('tr')[3]
style=style.findall('td')[1].text
print(style)
styleData.append(style)

Answer 1

可能你误用find_all function，试试这个解决方案：

style=table.find_all('tr')[3]
style=style.find_all('td')[1].text
print(style)

它会给你预期的 output

Answer 2

也可以做类似的事情：

import bs4 
import requests
style_data = []
url = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"

soup = bs4.BeautifulSoup(requests.get(url).content, 'html.parser')
# select the first `td` tag whose text contains the substring `Style:`.
row = soup.select_one('td:-soup-contains("Style:")')
if row:
    # if that row was found get its sibling which should be that vlue you want
    home_style_tag = row.next_sibling
    style_data.append(home_style_tag.text)

一些笔记

这使用 CSS 选择器而不是 find 方法。 有关详细信息，请参阅SoupSieve 文档。
select_one依赖于表格总是以某种方式排序的事实，如果不是这种情况，请使用select并遍历结果以找到文本恰好为'Style:'的bs4.Tag ，然后获取其下一个兄弟

使用select ：

rows = soup.select('td:-soup-contains("Style:")')
row = [r for r in rows if r.text == 'Style:']
home_style_text = row.text

Answer 3

您可以使用 CSS 选择器：

#MainContent_ctl01_grdCns tr:nth-of-type(4) td:nth-of-type(2)

其中将 select "MainContent_ctl01_grdCns" id ，第四个<tr> ，第二个<td> 。

要使用 CSS 选择器，请使用.select()方法而不是find_all() 。 或select_one()而不是find() 。

import requests
from bs4 import BeautifulSoup


URL = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")
print(
    soup.select_one(
        "#MainContent_ctl01_grdCns tr:nth-of-type(4)  td:nth-of-type(2)"
    ).text
)

Output：

Townhouse End

Answer 4

您可以在td上使用:contains来获取具有innerText "Style" 的节点，然后使用具有td类型选择器的相邻兄弟组合器来获取相邻的td值。

import bs4, requests

pagedata = requests.get("https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919") 
cleanpagedata = bs4.BeautifulSoup(pagedata.text, 'html.parser') 
print(cleanpagedata.select_one('td:contains("Style") + td').text)

使用 Beautiful Soup (Python) 从表中提取特定值

问题描述

4 个解决方案

解决方案1
1 2021-01-06 19:33:29

解决方案2
1 2021-01-06 20:08:41

解决方案3
1 2021-01-06 21:12:09

解决方案4
1 2021-01-07 05:05:29

使用 Beautiful Soup (Python) 从表中提取特定值

问题描述

4 个解决方案

解决方案1 1 2021-01-06 19:33:29

解决方案2 1 2021-01-06 20:08:41

解决方案3 1 2021-01-06 21:12:09

解决方案4 1 2021-01-07 05:05:29

解决方案1
1 2021-01-06 19:33:29

解决方案2
1 2021-01-06 20:08:41

解决方案3
1 2021-01-06 21:12:09

解决方案4
1 2021-01-07 05:05:29