简体   繁体   English

Beautiful Soup 并通过 ID 提取 div

[英]Beautiful Soup and extracting a div by ID

I am trying to extract the number of "confirmados" cases of COVID-19 from this page https://coronavirus.gob.mx/datos/我正在尝试从此页面https://coronavirus.gob.mx/datos/中提取 COVID-19 的“确认”病例数

This is my line of code table_div = soup.find('div', {"id": "gsPosDIV"}) but is not working, I am really neophyte with web scraping.这是我的代码行table_div = soup.find('div', {"id": "gsPosDIV"})但不起作用,我真的是 web 刮擦的新手。 Which is the correct form to extract this data?提取这些数据的正确形式是什么?

This is the html <div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div这是 html <div id="gsPosDIV" class="h5 mb-0 font-weight-bold text-gray-800">47,144</div

The data is loaded dynamically via JavaScript.数据通过 JavaScript 动态加载。 You can simulate the Javascript requests by requests module and then parse the data with re module:您可以通过requests模块模拟 Javascript 请求,然后使用re模块解析数据:

import re
import requests

data = {'sPatType': 'Confirmados',
'cve': '000',
'nom': 'Nacional'}

url = 'https://coronavirus.gob.mx/datos/Overview/info/getInfo.php'

raw_data = requests.post(url, data=data).text

positivos = re.search(r'document\.getElementById\("gsPosDIV"\)\.innerHTML = \((\d+)', raw_data).group(1)
print(positivos)

Prints:印刷:

47144

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM