[英]Beautiful Soup and extracting a div by ID
I am trying to extract the number of "confirmados" cases of COVID-19 from this page https://coronavirus.gob.mx/datos/我正在尝试从此页面https://coronavirus.gob.mx/datos/中提取 COVID-19 的“确认”病例数
This is my line of code table_div = soup.find('div', {"id": "gsPosDIV"})
but is not working, I am really neophyte with web scraping.这是我的代码行
table_div = soup.find('div', {"id": "gsPosDIV"})
但不起作用,我真的是 web 刮擦的新手。 Which is the correct form to extract this data?提取这些数据的正确形式是什么?
This is the html <div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div
这是 html
<div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div
The data is loaded dynamically via JavaScript.数据通过 JavaScript 动态加载。 You can simulate the Javascript requests by
requests
module and then parse the data with re
module:您可以通过
requests
模块模拟 Javascript 请求,然后使用re
模块解析数据:
import re
import requests
data = {'sPatType': 'Confirmados',
'cve': '000',
'nom': 'Nacional'}
url = 'https://coronavirus.gob.mx/datos/Overview/info/getInfo.php'
raw_data = requests.post(url, data=data).text
positivos = re.search(r'document\.getElementById\("gsPosDIV"\)\.innerHTML = \((\d+)', raw_data).group(1)
print(positivos)
Prints:印刷:
47144
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.