简体   繁体   English

在Python刮网页无法找到文本

[英]Scraping webpage in Python can't find text

I am trying to scrape this webpage using BeautifulSoup in Canopy 1.6.1. 我正在尝试使用Canopy 1.6.1中的BeautifulSoup抓取此网页 What I want to be able to return is the "College Dominator" rating as well as the "Breakout Age" rating. 我希望能够返回的是“大学统治者”等级和“突破年龄”等级。 I think that the reason this isn't working is because it's behind Javascript, but I don't know how to find that information from the scraped data. 我认为这不起作用的原因是因为它落后于Javascript,但我不知道如何从抓取的数据中找到该信息。 Please help! 请帮忙!

The page you provided is rendered within the browser with JavaScript (Angular). 您提供的页面在浏览器中使用JavaScript(角度)呈现。 The actual sent HTML doesn't include the information about "College Dominator". 实际发送的HTML不包含有关“大学统治者”的信息。 Thus you'll have to render it before you can parse it, I'd recommend using a library more suited for parsing client side rendered pages. 因此,你必须渲染,然后才能解析它,我建议你使用更适合解析客户端渲染页面的库。 Requests-HTML is one such option. Requests-HTML就是这样一种选择。 With that library you could achieve your results like this: 使用该库,您可以实现以下结果:

r = session.get("https://www.playerprofiler.com/nfl/larry-fitzgerald/")
r.html.render()
college_dominator = r.html.search("College Dominator {percentage}% ({rank}th)")
# {"rank": 96, "percentage": 51.3}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM