简体   繁体   English

BeautifulSoup - 找不到属性

[英]BeautifulSoup - can't find attribute

I'm trying to scrape this link .我正在尝试抓取此链接
I want to get to this part here:我想在这里进入这部分:

在此处输入图像描述

I can see where this part of the website is when I inspect the page:当我检查页面时,我可以看到网站的这一部分在哪里:
在此处输入图像描述

But I can't get to it from BeautifulSoup .但我无法从BeautifulSoup得到它。
Here is the code that I'm using and all the ways I've tried to access it:这是我正在使用的代码以及我尝试访问它的所有方式:

from bs4 import BeautifulSoup
import requests

link = 'https://www.sports-reference.com/cbb/players/temetrius-morant-1.html'
html_text = requests.get(link).text
soup = BeautifulSoup(html_text, 'html.parser')

soup.find_all(class_='data_grid')
soup.find_all(string="data_grid")
soup.find_all(attrs={"class": "data_grid"})

Also, when I just look at the html I can see that it is there:另外,当我只看html时,我可以看到它在那里:
在此处输入图像描述

if you re looking for the point section i suggest to search with id like this:如果您正在寻找point部分,我建议您使用id进行搜索,如下所示:

point_section=soup.find("div",{"id":"leaderboard_pts"})

You need to look at the actual source html code that you get in response (not the html you inspect, which you have shown to have done), you'll notice those tables are within the comments of the html Ie.您需要查看实际的源代码 html 作为响应(不是您检查的 html,您已证明已完成),您会注意到这些表在 ZFC35FDC70D5FC69D2698Ie3A822 的注释中。 <!-- and --> . <!----> BeautifulSoup ignores comments. BeautifulSoup 忽略注释。

There are a few ways to go about it.有几种方法 go 关于它。 BeautifulSoup does have a method to search and pull out comments, however with this particular site, I find it just easier to remove the comment tags. BeautifulSoup 确实有一种搜索和提取评论的方法,但是对于这个特定的站点,我发现删除评论标签更容易。

Once you do that, you can easily parse the html with BeautifulSoup to get the desired <div> tag, then just let pandas parse the <table> tag within there.一旦你这样做了,你可以很容易地用 BeautifulSoup 解析 html 以获得所需的<div>标签,然后让pandas解析其中的<table>标签。

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.sports-reference.com/cbb/players/temetrius-morant-1.html'
response = requests.get(url)
html = response.text
html = html.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(html, 'html.parser')
leaderboard_pts = soup.find('div', {'id':'leaderboard_pts'})

df = pd.read_html(str(leaderboard_pts))[0]

Output: Output:

print(df)
                        0
0  2017-18 OVC 405 (18th)
1  2018-19 NCAA 808 (9th)
2   2018-19 OVC 808 (1st)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM