使用 BeautifulSoup (python) 刮取疫苗接种数据

Question

I am new to scraping:).我是新来的刮:)。 I would like to scrape a website to get information about vaccination.我想抓取一个网站以获取有关疫苗接种的信息。 Here is the website: https://ourworldindata.org/covid-vaccinations这是网站： https://ourworldindata.org/covid-vaccinations

My goal is to obtain the table with three columns:我的目标是获得包含三列的表：

"Country" “国家”
"Share of people fully vaccinated against COVID-19" “完全接种了 COVID-19 疫苗的人的比例”
"Share of people only partly vaccinated against COVID-19" “仅部分接种了 COVID-19 疫苗的人的比例”

Here is my code:这是我的代码：

# importing basic libraries
import requests
from bs4 import BeautifulSoup


# request for getting the target html.
def get_html(URL):
    scrape_result = requests.get(URL)
    return scrape_result.text
vac_html = get_html("https://ourworldindata.org/covid-vaccinations")

# the BeautifulSoup library for scraping the data, with "html.parser" for parsing.
beatiful_soup = BeautifulSoup(vac_html, "html.parser")

# view the html script.
print(beatiful_soup.prettify())

# finding the content of interest 
get_table = beatiful_soup.find_all("tr")

for x in get_table:
    print("*********")
    print(x)

Current output: The entire webpage as HTML.当前 output：整个网页为 HTML。 This is a fraction of it:这是其中的一小部分：


'\n<!DOCTYPE html>\n<!--[if IE 8]> <html lang="en" class="ie8"> <![endif]-->\n<!--[if IE 9]> <html lang="en" class="ie9"> <![endif]-->\n<!--[if !IE]><!-->\n<html lang="en">\n<!--<![endif]-->\n<head>\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta name="viewport" content="width=device-width, initial-scale=1">\n<title>COVID Live Update: 261,656,911 Cases and 5,216,375 Deaths from the Coronavirus - Worldometer</title>\n<meta name="description" content="Live statistics and coronavirus news tracking the number of confirmed cases, recovered patients, tests, and death toll due to the COVID-19 coronavirus from Wuhan, China. Coronavirus counter with new cases, deaths, and number of tests per 1 Million population. Historical data and info. Daily charts, graphs, news and updates">\n\n<link rel="shortcut icon" href="/favicon/favicon.ico" type="image/x-icon">\n<link rel="apple-touch-icon" sizes="57x57" href="/favicon/apple-icon-57x57.png">\n<link rel="apple-touch-icon" sizes="60x60" href="/favicon/apple-icon-60x60.png">\n<link rel="apple-touch-icon" sizes="72x72" href="/favicon/apple-icon-72x72.png">\n<link rel="apple-touch-icon" sizes="76x76" href="/favicon/apple-icon-76x76.png">\n<link rel="apple-touch-icon" sizes="114x114"

Unfortunately, it is not producing the information I liked to see.不幸的是，它没有产生我喜欢看到的信息。 Does anyone have some experience in web scraping and could quickly review my code?有没有人在 web 抓取方面有一些经验并且可以快速查看我的代码？

Thanks in advance for your help!在此先感谢您的帮助！

Answer 1

Just took a quick look at that website.只是快速浏览了那个网站。 I suggest instead of using beautiful soup, you should just use the request that they are using to get the data.我建议不要使用漂亮的汤，而应该只使用他们用来获取数据的请求。 In the network request ( viewed using dev tools ) you will find a GET request to https://covid.ourworldindata.org/data/internal/megafile--vaccinations.json you can go back to the site yourself and try this.在网络请求（使用开发工具查看）中，您将找到对https://covid.ourworldindata.org/data/internal/megafile--vaccinations.json的 GET 请求，您可以自己尝试将 Z34D1F91FB12E514B8A6BA 回到站点并尝试返回If you go to that link above you can see that it returns a nice JSON object that you can parse.如果您将 go 转到上面的那个链接，您会看到它返回一个不错的 JSON object ，您可以解析它。

Answer 2

It's all there if you get the data directly from the source:如果您直接从源获取数据，这一切都在那里：

import requests
import pandas as pd

url = "https://covid.ourworldindata.org/data/internal/megafile--vaccinations-bydose.json"
jsonData = requests.get(url).json()

df = pd.DataFrame(jsonData)

Output: Output：

print(df)
          location  ... people_partly_vaccinated_per_hundred
0      Afghanistan  ...                             0.987197
1      Afghanistan  ...                             0.986009
2      Afghanistan  ...                             0.952562
3      Afghanistan  ...                             0.924529
4      Afghanistan  ...                             0.918366
           ...  ...                                  ...
30218     Zimbabwe  ...                             6.310471
30219     Zimbabwe  ...                             6.384688
30220     Zimbabwe  ...                             6.429645
30221     Zimbabwe  ...                             6.429439
30222     Zimbabwe  ...                             6.447568

[30223 rows x 6 columns]

使用 BeautifulSoup (python) 刮取疫苗接种数据

问题描述

2 个解决方案

解决方案1
3 2021-11-28 21:15:26

解决方案2
0 2021-11-29 15:57:34

使用 BeautifulSoup (python) 刮取疫苗接种数据

问题描述

2 个解决方案

解决方案1 3 2021-11-28 21:15:26

解决方案2 0 2021-11-29 15:57:34

解决方案1
3 2021-11-28 21:15:26

解决方案2
0 2021-11-29 15:57:34