简体   繁体   English

Python Requests-HTML - 找不到特定数据

[英]Python Requests-HTML - Can't find specific data

I am trying to scrape a web page using python requests-html library.我正在尝试使用 python requests-html 库抓取网页。
link to that web page is https://www.koyfin.com/charts/g/USADebt2GDP?view=table , below image shows (red rounded data) the data what i want to get.该网页的链接是https://www.koyfin.com/charts/g/USADebt2GDP?view=table ,下图显示(红色四舍五入的数据)我想要获得的数据。

在此处输入图片说明

My code is like this,我的代码是这样的

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.koyfin.com/charts/g/USADebt2GDP?view=table')
r.html.render(timeout=60)
print(r.text)

web page html like this,像这样的网页html,

在此处输入图片说明

Problem is when i scrape the web page i can't find the data i want, in HTML code i can see the data inside first div tags in body section.问题是当我抓取网页时我找不到我想要的数据,在 HTML 代码中我可以看到正文部分的第一个 div 标签内的数据。 Any specific suggestions for how to solve this.有关如何解决此问题的任何具体建议。

Thanks.谢谢。

The problem is that the data is being loaded by JavaScript code after the initial page load.问题是数据在初始页面加载后由 JavaScript 代码加载。 One solution is to use Selenium to drive a web browser to scrape the page.一种解决方案是使用Selenium驱动 Web 浏览器来抓取页面。 But using a regular browser I looked at the network requests that were being made and it appears that the data you seek is being loaded with the following AJAX call:但是使用常规浏览器,我查看了正在发出的网络请求,看起来您寻找的数据正在通过以下 AJAX 调用加载:

https://api.koyfin.com/api/v2/commands/g/g.gec/USADebt2GDP?dateFrom=2010-08-20&dateTo=2020-09-05&period=yearly

So:所以:

import requests

response = requests.get('https://api.koyfin.com/api/v2/commands/g/g.gec/USADebt2GDP?dateFrom=2010-08-20&dateTo=2020-09-05&period=yearly')
results = response.json();
print(results)
for t in results['graph']['data']:
    print(t)

Prints:印刷:

{'ticker': 'USADebt2GDP', 'companyName': 'United States Gross Federal Debt to GDP', 'startDate': '1940-12-31T00:00:00.000Z', 'endDate': '2019-12-31T00:00:00.000Z', 'unit': 'percent', 'graph': {'column_names': ['Date', 'Volume'], 'data': [['2010-12-31', 91.4], ['2011-12-31', 96], ['2012-12-31', 100.1], ['2013-12-31', 101.2], ['2014-12-31', 103.2], ['2015-12-31', 100.8], ['2016-12-31', 105.8], ['2017-12-31', 105.4], ['2018-12-31', 106.1], ['2019-12-31', 106.9]]}, 'withoutLiveData': True}
['2010-12-31', 91.4]
['2011-12-31', 96]
['2012-12-31', 100.1]
['2013-12-31', 101.2]
['2014-12-31', 103.2]
['2015-12-31', 100.8]
['2016-12-31', 105.8]
['2017-12-31', 105.4]
['2018-12-31', 106.1]
['2019-12-31', 106.9]

How I Came Up with the URL我是如何想出 URL 的

在此处输入图片说明

And when you click on the last message:当您单击最后一条消息时:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM