简体   繁体   中英

Python beautifulsoup, scraping a table in a website

I recently started to get interested in Web scraping via the python library beautifulsoup4, My goal is to get The data about the covid-19 cases (in Morocco is a good start); The website my info is in is : "https://www.worldometers.info/coronavirus/" There is a Big Table with all the info, i've tried to do something like this :

U = 'https://www.worldometers.info/coronavirus/'
response = requests.get(U)
html_soup = BeautifulSoup(response.text, 'html.parser')
info = html_soup.find_all('tr', class_='even')
print(info)

But the info list is empty i tried to change classes and the Tags but it seems like i'm doing something wrong (The morrocco info is on the 30 row)

UPDATE : i used selenium to get my info, btw i use google collab so it was kinda hard but now way better Da link for the solution in a python notebook format

The data is being dynamically generated via JS. If you go into your browser and disable Javascript in the dev tools, you will see that there are no elements with <tr class="even">

You will either need to find out where the data is being obtained (via some web API) using a tool like HTTP Trace or use something like Selenium which will run the Javascript to load the HTML.

您想传递标签属性的字典:

info = html_soup.find_all('tr', {'class':'even'})

This gave me a full list countries.

url       = 'https://www.worldometers.info/coronavirus/'

response  = requests.get(url)

html_soup = BeautifulSoup(response.text, 'html.parser')
info      = html_soup.find_all('a', {'class':'mt_a'})


print(info[29].text) # returns Marocco


# All the rest

for i in info:  
  print(i.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM