繁体   English   中英

在更改下拉列表中的选项时从 URL 不更改的站点抓取数据

[英]Scraping data from a site where the URL doesn't change while changing options in a drop-down list

我正在使用BeautifulSoup在此网页中抓取2017 年 4 月 1 日的安特卫普天气历史表。 但我不仅需要这个日期,我还需要 2017 年 4 月的所有日子,它们在下拉列表中: 在此处输入图像描述

在检查器中,它是一个select标签,带有以下选项:

在此处输入图像描述

我可以用下面的代码得到它们的值:

prefix = 'https://www.timeanddate.com'
weather_request = requests.get(prefix + '/weather/belgium/antwerp/historic?month=4&year=2017', 
                       'html.parser')
weather = BeautifulSoup(weather_request.content)

for option in weather.select('select > option'):
     append_to_mylist(option.get('value'), option.text)

你能帮我,如何刮除这些值之外的表格,因为 URL 在更改下拉列表中的选项时不会改变?

我发现了一些其他类似的问题,但不是关于BeautifulSoup

数据通过 Ajax 从其他 URL 加载。 返回的数据不是 Json,而是原始的 Javascript,因此需要进行一些预处理才能正确解析。

例如:

import re
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup


for day in range(1, 31):
    print('Getting info for day {}..'.format(day))
    url = 'https://www.timeanddate.com/scripts/cityajax.php?n=belgium/antwerp&mode=historic&hd=201704{:02d}&month=4&year=2017&json=1'.format(day)

    data = requests.get(url).text
    data = json.loads(re.sub(r'(c|h|s):', r'"\1":', data))

    # uncomment this to print raw data:
    # print(json.dumps(data, indent=4))

    # construct the table from json:
    table = '<table>'
    for row in data:
        table += '<tr>'
        for cell in row['c']:
            table += '<td>' + BeautifulSoup(cell['h'], 'html.parser').get_text(strip=True, separator=' ') + '</td>'
        table += '</tr>'
    table += '</table>'

    # now in `table` is HTML table, you can parse it with BeautifulSoup, or pass it to Pandas:
    df = pd.read_html(table)[0]
    print(df)
    print('-' * 120)

印刷:

Getting info for day 1..
                      0   1      2                            3      4  5     6          7      8
0   12:20 am Sat, Apr 1 NaN  50 °F                       Clear.  2 mph  ↑   94%  29.92 "Hg   2 mi
1              12:50 am NaN  46 °F                         Fog.  2 mph  ↑  100%  29.92 "Hg   2 mi
2               1:20 am NaN  48 °F                   Light fog.  3 mph  ↑   87%  29.89 "Hg   0 mi
3               1:50 am NaN  48 °F                       Clear.  3 mph  ↑   94%  29.89 "Hg   1 mi
4               2:20 am NaN  46 °F                         Fog.  5 mph  ↑  100%  29.89 "Hg   1 mi
5               3:20 am NaN  46 °F                       Clear.  3 mph  ↑   93%  29.89 "Hg   1 mi
6               3:50 am NaN  46 °F                         Fog.  6 mph  ↑   93%  29.86 "Hg   1 mi
7               4:20 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
8               4:50 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
9               5:20 am NaN  46 °F                         Fog.  2 mph  ↑   93%  29.86 "Hg   2 mi
10              5:50 am NaN  48 °F                       Clear.  3 mph  ↑   87%  29.86 "Hg   4 mi
11              6:20 am NaN  48 °F                       Clear.  5 mph  ↑   87%  29.83 "Hg   4 mi
12              6:50 am NaN  48 °F                       Clear.  5 mph  ↑   94%  29.86 "Hg   4 mi
13              7:20 am NaN  50 °F            Sprinkles. Clear.  6 mph  ↑   94%  29.86 "Hg   4 mi
14              7:50 am NaN  52 °F    Sprinkles. Broken clouds.  9 mph  ↑   88%  29.86 "Hg   3 mi
15              8:20 am NaN  52 °F    Light rain. Partly sunny.  8 mph  ↑   88%  29.86 "Hg   5 mi
16              8:50 am NaN  52 °F  Light rain. Passing clouds.  6 mph  ↑   94%  29.86 "Hg   5 mi
17              9:20 am NaN  52 °F       Drizzle. Partly sunny.  5 mph  ↑   94%  29.86 "Hg   5 mi
18              9:50 am NaN  52 °F               Broken clouds.  5 mph  ↑   94%  29.86 "Hg   5 mi
19             10:20 am NaN  52 °F               Broken clouds.  6 mph  ↑   94%  29.89 "Hg    NaN
20             10:50 am NaN  52 °F    Sprinkles. Broken clouds.  8 mph  ↑   94%  29.89 "Hg   5 mi
21             11:20 am NaN  52 °F                Partly sunny.  5 mph  ↑   94%  29.89 "Hg    NaN
22             11:50 am NaN  54 °F            Scattered clouds.  2 mph  ↑   88%  29.89 "Hg    NaN
23             12:20 pm NaN  55 °F            Scattered clouds.  5 mph  ↑   82%  29.89 "Hg    NaN
24             12:50 pm NaN  55 °F            Scattered clouds.  3 mph  ↑   77%  29.89 "Hg    NaN
25              1:20 pm NaN  57 °F              Passing clouds.  5 mph  ↑   72%  29.89 "Hg    NaN
26              1:50 pm NaN  57 °F              Passing clouds.  3 mph  ↑   67%  29.89 "Hg    NaN
27              2:20 pm NaN  57 °F              Passing clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
28              2:50 pm NaN  57 °F            Scattered clouds.  3 mph  ↑   72%  29.89 "Hg    NaN
29              3:20 pm NaN  55 °F    Sprinkles. Broken clouds.  9 mph  ↑   77%  29.89 "Hg   4 mi
30              3:50 pm NaN  55 °F    Sprinkles. Broken clouds.  3 mph  ↑   77%  29.86 "Hg   5 mi
31              4:20 pm NaN  55 °F    Sprinkles. Broken clouds.  2 mph  ↑   82%  29.89 "Hg    NaN
32              4:50 pm NaN  57 °F            Scattered clouds.  2 mph  ↑   77%  29.86 "Hg    NaN
33              5:20 pm NaN  57 °F            Scattered clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
34              5:50 pm NaN  55 °F            Scattered clouds.  6 mph  ↑   88%  29.89 "Hg    NaN
35              6:20 pm NaN  55 °F              Passing clouds.  6 mph  ↑   82%  29.89 "Hg    NaN
36              6:50 pm NaN  55 °F              Passing clouds.  3 mph  ↑   82%  29.89 "Hg    NaN
37              7:20 pm NaN  54 °F              Passing clouds.  5 mph  ↑   94%  29.89 "Hg    NaN
38              7:50 pm NaN  54 °F              Passing clouds.  5 mph  ↑   88%  29.89 "Hg    NaN
39              8:20 pm NaN  54 °F              Passing clouds.  7 mph  ↑   88%  29.92 "Hg    NaN
40              8:50 pm NaN  54 °F                       Clear.  7 mph  ↑   88%  29.92 "Hg  10 mi
41              9:20 pm NaN  54 °F                       Clear.  2 mph  ↑   88%  29.92 "Hg  10 mi
42              9:50 pm NaN  52 °F                       Clear.  5 mph  ↑   94%  29.92 "Hg  10 mi
43             10:20 pm NaN  48 °F                       Clear.  2 mph  ↑  100%  29.95 "Hg  10 mi
44             10:50 pm NaN  52 °F                       Clear.  3 mph  ↑   88%  29.95 "Hg   4 mi
45             11:20 pm NaN  46 °F                         Fog.  2 mph  ↑   93%  29.95 "Hg   1 mi
46             11:50 pm NaN  46 °F                       Clear.  3 mph  ↑   93%  29.95 "Hg   0 mi
------------------------------------------------------------------------------------------------------------------------
Getting info for day 2..
                      0   1      2                  3       4  5     6          7      8
0   12:20 am Sun, Apr 2 NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
1              12:50 am NaN  45 °F               Fog.   2 mph  ↑   93%  29.98 "Hg   1 mi
2               1:20 am NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
3               1:50 am NaN  45 °F             Clear.   3 mph  ↑   87%  29.98 "Hg   4 mi
4               2:20 am NaN  48 °F             Clear.   6 mph  ↑   87%  29.98 "Hg  10 mi
5               2:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg  10 mi
6               3:20 am NaN  48 °F             Clear.   5 mph  ↑   87%  29.98 "Hg  10 mi
7               3:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg   6 mi
8               4:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
9               5:20 am NaN  46 °F    Passing clouds.   3 mph  ↑   87%  30.01 "Hg    NaN
10              5:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
11              6:20 am NaN  46 °F             Clear.   1 mph  ↑   87%  30.04 "Hg   4 mi
12              6:50 am NaN  45 °F         Light fog.   2 mph  ↑   93%  30.04 "Hg   5 mi


... and so on.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM