简体   繁体   中英

Stuck with HTML scraping using BeautifulSoup (Python)

["

I want to convert activities uploaded to strava to a .gpx file.<\/i>

       </li>
      </ul>
     </div>
    </nav>
   </header>
   <div data-react-class="ActivityPublic" data-react-props='{
  "activity": {
    "name": "Morning Ride",
    "date": "Today",
    "athlete": {
      "name": "James Whyard",
      "avatarUrl": "https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c",
      "location": "",
      "followersCount": 3,
      "followAthleteUrl": "http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show",
      "totalDistance": "452",
      "distanceUnit": "miles",
      "totalActivities": 40
    },
    "type": "Ride",
    "detailedType": "Ride",
    "kudosCount": 0,
    "comments": [],
    "commentCount": 0,
    "achievementsCount": 11,
    "distance": "11.7 mi",
    "time": "49:38",
    "elevation": "246 ft",
    "calories": 526.0,
    "streams": {
      "altitude": [6.6, 6.6, 6.6, 6.7, 6.7, 6.7, 6.7, 6.7, 6.7, 6.9, 6.7, 6.6, 6.5, 6.4, 6.4, 6.4, 6.4, 6.2, 5.9, 6.0, 5.9, 5.8, 5.7, 5.6, 5.6, 5.6, 5.7, 5.9, 6.0, 6.0, 5.9, 5.9, 5.9, 6.0, 6.0, 6.0, 6.0, 6.0, 6.1, 6.2, 6.2, 6.4, 6.5, 6.5, 6.6, 6.9, 7.2, 7.2, 7.4
["

You might use .get<\/code> on element to get attribute value, that is<\/i>

import requests
from bs4 import BeautifulSoup

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = divdata.get('data-react-props')
print(strdata)

Your very close with this!

What I would do is grab the div element as your are doing then get the data-react-props property that contains all the data your looking for. This is clearly formatted in json so we can interpret as such and get all the information we need from it from there..

import requests
import json
from bs4 import BeautifulSoup
import csv

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
activity_data = divdata.get("data-react-props")
activity_dict = json.loads(activity_data)

print("My rides elevation was:", activity_dict['activity']['elevation'])

Edit : @It_is_Chris suggested using the Strava API instead, https://developers.strava.com/docs/reference/ . This seems like a better alternative.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM