简体   繁体   中英

Scraping Webpage With Beautiful Soup

I am new to web scraping and I am trying to scrape wind data from a website. Here is the website: https://wx.ikitesurf.com/spot/507 . I understand that I can do this using selenium to find elements but I think I may have found a better way. Please correct if I am wrong. When in developer tools I can find this page by going to network->JS->getGraph?

https://api.weatherflow.com/wxengine/rest/graph/getGraph?callback=jQuery17200020271765600428093_1619158293267&units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881

This page contains all the data I need and it is constantly updating. Here is my code:

url = 'https://api.weatherflow.com/wxengine/rest/graph/getGraph?callback=jQuery17200020271765600428093_1619158293267&units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
time.sleep(3)
wind = soup.find("last_ob_wind_desc")
print (wind)

I tried using beautiful soup to scrape but I always receive the answer "None". Does anyone know how I can scrape this page? I would like to know what I am doing wrong. Thanks for any help!

Removing callback=jQuery17200020271765600428093_1619158293267& from the api url will make it return proper json:

import requests

url = 'https://api.weatherflow.com/wxengine/rest/graph/getGraph?units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881'
response = requests.get(url).json()

response is now a dictionary with the data. last_ob_wind_desc can be retrieved with response['last_ob_wind_desc'] .

You can also save the data to csv or other file formats with pandas :

import pandas as pd

df = pd.json_normalize(response)
df.to_csv('filename.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM