简体   繁体   English

如何使用 BeautifulSoup 解析特定的 HTML 标签?

[英]How to parse a specific HTML tag using BeautifulSoup?

I am trying to webscrape this website: https://datausa.io/profile/university/cuny-city-college/我正在尝试抓取这个网站: https://datausa.io/profile/university/cuny-city-college/

My code only retrieves the first matching div class tag which is tuition but I only want to retrieve Room and Board cost.我的代码只检索第一个匹配的 div class 标签,这是学费,但我只想检索食宿费用。 How do I parse a specific tag?如何解析特定标签?

import requests

url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')

rb = soup.find('div',class_='stat-value')

print(rb.prettify)

What you can do use find method on state-titel and add specific text in it so it will find that tag and we have to extract previous tag from it so use previous method on it!您可以在state-titel上使用find方法并在其中添加特定文本以便找到该标签,我们必须从中提取先前的标签,因此使用previous的方法!

import requests

url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')

rb = soup.find('div',class_='stat-title',text="Room and Board").find_previous()
print(rb.get_text())

Output: Output:

$15,406

You can use :has , :-soup-contains , and an adjacent sibling combinator (+), to specify stat-value with immediately adjacent stat-title containing text "Room and Board"您可以使用:has:-soup-contains和相邻的同级组合符 (+) 来指定stat-value以及包含文本“Room and Board”的紧邻stat-title

import requests
from bs4 import BeautifulSoup as bs

soup = bs(requests.get('https://datausa.io/profile/university/cuny-city-college/').text)
print(soup.select_one('.stat-value:has(+ .stat-title:-soup-contains("Room and Board"))').text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM