简体   繁体   English

如何从 wunderground 中抓取动态表

[英]how to scrape a dynamic table from wunderground

I am having trouble scraping a table with python.我在使用 python 抓取表格时遇到问题。 Example is the big table in a weather history website with all the numbers every hour.示例是天气历史网站中的大表,每小时都有所有数字。

url= "https://www.wunderground.com/history/daily/us/va/arlington-county/KDCA/date/2019-1-25"
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
my_table = soup.find("table", class_ = "mat-table cdk-table mat-sort ng-star-inserted")
print(my_table)

I got the class attribute by inspecting the html.我通过检查 html 获得了 class 属性。 The problem is I get None... It's like it is non existing.问题是我没有得到...它就像它不存在一样。 I checked that I get 200 as a response from the website so this is not the problem.我检查了我得到 200 作为网站的响应,所以这不是问题。 Am I missing something here?我在这里错过了什么吗?

Thanks,谢谢,

The table is constructed dynamically by JavaScprit from data that comes from an API endpoint.该表由JavaScprit根据来自 API 端点的数据动态构建。

You can query that endpoint and reconstruct the table.您可以查询该端点并重建表。

Here's how:就是这样:

from datetime import datetime

import requests
from tabulate import tabulate

endpoint = "https://api.weather.com/v1/location/KDCA:9:US/observations/historical.json?apiKey=e1f10a1e78da46f5b10a1e78da96f525&units=e&startDate=20190125&endDate=20190125"

response = requests.get(endpoint).json()["observations"]
weather_data = sorted(response, key=lambda k: k["valid_time_gmt"])

header = [
    "Time", "Temperature", "Dew Point", "Humidity", "Wind",
    "Wind Speed", "Wind Gust", "Pressure", "Percip.", "Conditions",
]

table = []
for item in weather_data:
    row = [
        datetime.fromtimestamp(item["valid_time_gmt"]).strftime('%I:%M %p'),
        item["temp"],
        f'{item["dewPt"]} °F',
        f'{item["rh"]} %',
        item["wdir_cardinal"],
        item["wspd"],
        f'{item["gust"] if item["gust"] else 0} mph',
        f'{item["pressure"]} in',
        f'{item["precip_total"] if item["precip_total"] else "0.0 in"}',
        item["wx_phrase"],
    ]
    table.append(row)

print(tabulate(table, headers=header, tablefmt="pretty"))

Sample output:样品 output:

+----------+-------------+-----------+----------+------+------------+-----------+----------+---------+---------------+
|   Time   | Temperature | Dew Point | Humidity | Wind | Wind Speed | Wind Gust | Pressure | Percip. |  Conditions   |
+----------+-------------+-----------+----------+------+------------+-----------+----------+---------+---------------+
| 06:52 AM |     37      |   22 °F   |   54 %   | WNW  |     10     |   0 mph   | 29.91 in | 0.0 in  |     Fair      |
| 07:52 AM |     36      |   21 °F   |   55 %   |  NW  |     6      |   0 mph   | 29.92 in | 0.0 in  |     Fair      |
| 08:52 AM |     35      |   21 °F   |   57 %   |  W   |     7      |   0 mph   | 29.95 in | 0.0 in  |     Fair      |
| 09:26 AM |     35      |   21 °F   |   57 %   |  W   |     6      |   0 mph   | 29.95 in | 0.0 in  | Mostly Cloudy |
| 09:52 AM |     35      |   22 °F   |   59 %   |  W   |     5      |   0 mph   | 29.96 in | 0.0 in  | Mostly Cloudy |
| 10:52 AM |     36      |   22 °F   |   57 %   | VAR  |     5      |   0 mph   | 29.98 in | 0.0 in  |    Cloudy     |
| 11:52 AM |     35      |   21 °F   |   57 %   | WNW  |     9      |   0 mph   | 30.0 in  | 0.0 in  | Partly Cloudy |
| 12:52 PM |     35      |   21 °F   |   57 %   |  NW  |     10     |   0 mph   | 30.0 in  |  0.79   | Mostly Cloudy |
| 01:52 PM |     35      |   21 °F   |   57 %   | WNW  |     5      |   0 mph   | 30.02 in | 0.0 in  | Partly Cloudy |
| 02:52 PM |     37      |   22 °F   |   54 %   | WSW  |     8      |   0 mph   | 30.04 in | 0.0 in  | Partly Cloudy |
| 03:52 PM |     40      |   22 °F   |   49 %   |  W   |     14     |  18 mph   | 30.04 in | 0.0 in  |     Fair      |
| 04:52 PM |     41      |   19 °F   |   41 %   |  W   |     17     |  21 mph   | 30.06 in | 0.0 in  | Partly Cloudy |
| 05:52 PM |     43      |   20 °F   |   40 %   | WNW  |     14     |  23 mph   | 30.06 in | 0.0 in  | Partly Cloudy |
| 06:52 PM |     41      |   18 °F   |   40 %   |  W   |     16     |   0 mph   | 30.07 in | 0.0 in  | Mostly Cloudy |
| 07:21 PM |     42      |   19 °F   |   40 %   |  NW  |     16     |  24 mph   | 30.07 in | 0.0 in  | Mostly Cloudy |
| 07:52 PM |     43      |   20 °F   |   40 %   |  NW  |     18     |  24 mph   | 30.08 in | 0.0 in  | Partly Cloudy |
| 08:52 PM |     43      |   12 °F   |   28 %   |  NW  |     9      |  20 mph   | 30.09 in | 0.0 in  |     Fair      |
| 09:52 PM |     42      |   11 °F   |   28 %   | WNW  |     13     |   0 mph   | 30.09 in | 0.0 in  |     Fair      |
| 10:52 PM |     40      |   10 °F   |   29 %   |  NW  |     13     |   0 mph   | 30.12 in | 0.0 in  |     Fair      |
| 11:52 PM |     37      |   12 °F   |   36 %   | NNW  |     10     |   0 mph   | 30.14 in | 0.0 in  |     Fair      |
| 12:52 AM |     36      |   8 °F    |   31 %   | NNW  |     12     |   0 mph   | 30.16 in | 0.0 in  |     Fair      |
| 01:52 AM |     35      |   8 °F    |   33 %   | NNW  |     9      |   0 mph   | 30.17 in | 0.0 in  |     Fair      |
| 02:52 AM |     33      |   7 °F    |   34 %   |  NW  |     13     |   0 mph   | 30.18 in | 0.0 in  |     Fair      |
| 03:52 AM |     32      |   7 °F    |   35 %   | NNW  |     5      |   0 mph   | 30.18 in | 0.0 in  |     Fair      |
| 04:52 AM |     32      |   7 °F    |   35 %   |  N   |     5      |   0 mph   | 30.2 in  | 0.0 in  |     Fair      |
| 05:52 AM |     31      |   7 °F    |   37 %   | NNW  |     6      |   0 mph   | 30.22 in | 0.0 in  |     Fair      |
+----------+-------------+-----------+----------+------+------------+-----------+----------+---------+---------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM