简体   繁体   English

Python beautifulsoup,抓取网站中的表格

[英]Python beautifulsoup, scraping a table in a website

I recently started to get interested in Web scraping via the python library beautifulsoup4, My goal is to get The data about the covid-19 cases (in Morocco is a good start);我最近开始通过python库beautifulsoup4开始对Web抓取感兴趣,我的目标是获取有关covid-19案例的数据(在摩洛哥是一个好的开始); The website my info is in is : "https://www.worldometers.info/coronavirus/" There is a Big Table with all the info, i've tried to do something like this :我的信息所在的网站是:“https://www.worldometers.info/coronavirus/”有一个包含所有信息的大表,我尝试做这样的事情:

U = 'https://www.worldometers.info/coronavirus/'
response = requests.get(U)
html_soup = BeautifulSoup(response.text, 'html.parser')
info = html_soup.find_all('tr', class_='even')
print(info)

But the info list is empty i tried to change classes and the Tags but it seems like i'm doing something wrong (The morrocco info is on the 30 row)但是信息列表是空的,我尝试更改类和标签,但似乎我做错了什么(morrocco 信息在第 30 行)

UPDATE : i used selenium to get my info, btw i use google collab so it was kinda hard but now way better Da link for the solution in a python notebook format更新:我使用 selenium 来获取我的信息,顺便说一句,我使用 google collab,所以这有点困难,但现在更好的方式是 python 笔记本格式的解决方案的 Da 链接

The data is being dynamically generated via JS.数据是通过 JS 动态生成的。 If you go into your browser and disable Javascript in the dev tools, you will see that there are no elements with <tr class="even">如果您进入浏览器并在开发工具中禁用 Javascript,您将看到没有带有<tr class="even">元素

You will either need to find out where the data is being obtained (via some web API) using a tool like HTTP Trace or use something like Selenium which will run the Javascript to load the HTML.您要么需要使用HTTP Trace 之类的工具(通过某些 Web API)找出获取数据的位置,要么使用Selenium 之类的工具来运行 Javascript 来加载 HTML。

您想传递标签属性的字典:

info = html_soup.find_all('tr', {'class':'even'})

This gave me a full list countries.这给了我一个完整的国家列表。

url       = 'https://www.worldometers.info/coronavirus/'

response  = requests.get(url)

html_soup = BeautifulSoup(response.text, 'html.parser')
info      = html_soup.find_all('a', {'class':'mt_a'})


print(info[29].text) # returns Marocco


# All the rest

for i in info:  
  print(i.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM