I am new to scraping:). I would like to scrape a website to get information about vaccination. Here is the website: https://ourworldindata.org/covid-vaccinations
My goal is to obtain the table with three columns:
Here is my code:
# importing basic libraries
import requests
from bs4 import BeautifulSoup
# request for getting the target html.
def get_html(URL):
scrape_result = requests.get(URL)
return scrape_result.text
vac_html = get_html("https://ourworldindata.org/covid-vaccinations")
# the BeautifulSoup library for scraping the data, with "html.parser" for parsing.
beatiful_soup = BeautifulSoup(vac_html, "html.parser")
# view the html script.
print(beatiful_soup.prettify())
# finding the content of interest
get_table = beatiful_soup.find_all("tr")
for x in get_table:
print("*********")
print(x)
Current output: The entire webpage as HTML. This is a fraction of it:
'\n<!DOCTYPE html>\n<!--[if IE 8]> <html lang="en" class="ie8"> <![endif]-->\n<!--[if IE 9]> <html lang="en" class="ie9"> <![endif]-->\n<!--[if !IE]><!-->\n<html lang="en">\n<!--<![endif]-->\n<head>\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta name="viewport" content="width=device-width, initial-scale=1">\n<title>COVID Live Update: 261,656,911 Cases and 5,216,375 Deaths from the Coronavirus - Worldometer</title>\n<meta name="description" content="Live statistics and coronavirus news tracking the number of confirmed cases, recovered patients, tests, and death toll due to the COVID-19 coronavirus from Wuhan, China. Coronavirus counter with new cases, deaths, and number of tests per 1 Million population. Historical data and info. Daily charts, graphs, news and updates">\n\n<link rel="shortcut icon" href="/favicon/favicon.ico" type="image/x-icon">\n<link rel="apple-touch-icon" sizes="57x57" href="/favicon/apple-icon-57x57.png">\n<link rel="apple-touch-icon" sizes="60x60" href="/favicon/apple-icon-60x60.png">\n<link rel="apple-touch-icon" sizes="72x72" href="/favicon/apple-icon-72x72.png">\n<link rel="apple-touch-icon" sizes="76x76" href="/favicon/apple-icon-76x76.png">\n<link rel="apple-touch-icon" sizes="114x114"
Unfortunately, it is not producing the information I liked to see. Does anyone have some experience in web scraping and could quickly review my code?
Thanks in advance for your help!
Just took a quick look at that website. I suggest instead of using beautiful soup, you should just use the request that they are using to get the data. In the network request ( viewed using dev tools ) you will find a GET request to https://covid.ourworldindata.org/data/internal/megafile--vaccinations.json you can go back to the site yourself and try this.If you go to that link above you can see that it returns a nice JSON object that you can parse.
It's all there if you get the data directly from the source:
import requests
import pandas as pd
url = "https://covid.ourworldindata.org/data/internal/megafile--vaccinations-bydose.json"
jsonData = requests.get(url).json()
df = pd.DataFrame(jsonData)
Output:
print(df)
location ... people_partly_vaccinated_per_hundred
0 Afghanistan ... 0.987197
1 Afghanistan ... 0.986009
2 Afghanistan ... 0.952562
3 Afghanistan ... 0.924529
4 Afghanistan ... 0.918366
... ... ...
30218 Zimbabwe ... 6.310471
30219 Zimbabwe ... 6.384688
30220 Zimbabwe ... 6.429645
30221 Zimbabwe ... 6.429439
30222 Zimbabwe ... 6.447568
[30223 rows x 6 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.