繁体   English   中英

使用 BeautifulSoup (Python) 进行 HTML 抓取

[英]Stuck with HTML scraping using BeautifulSoup (Python)

["

       </li>
      </ul>
     </div>
    </nav>
   </header>
   <div data-react-class="ActivityPublic" data-react-props='{
  "activity": {
    "name": "Morning Ride",
    "date": "Today",
    "athlete": {
      "name": "James Whyard",
      "avatarUrl": "https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c",
      "location": "",
      "followersCount": 3,
      "followAthleteUrl": "http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show",
      "totalDistance": "452",
      "distanceUnit": "miles",
      "totalActivities": 40
    },
    "type": "Ride",
    "detailedType": "Ride",
    "kudosCount": 0,
    "comments": [],
    "commentCount": 0,
    "achievementsCount": 11,
    "distance": "11.7 mi",
    "time": "49:38",
    "elevation": "246 ft",
    "calories": 526.0,
    "streams": {
      "altitude": [6.6, 6.6, 6.6, 6.7, 6.7, 6.7, 6.7, 6.7, 6.7, 6.9, 6.7, 6.6, 6.5, 6.4, 6.4, 6.4, 6.4, 6.2, 5.9, 6.0, 5.9, 5.8, 5.7, 5.6, 5.6, 5.6, 5.7, 5.9, 6.0, 6.0, 5.9, 5.9, 5.9, 6.0, 6.0, 6.0, 6.0, 6.0, 6.1, 6.2, 6.2, 6.4, 6.5, 6.5, 6.6, 6.9, 7.2, 7.2, 7.4
["

import requests
from bs4 import BeautifulSoup

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = divdata.get('data-react-props')
print(strdata)

你非常接近这个!

我要做的就是像您一样抓取 div 元素,然后获取包含您要查找的所有数据的data-react-props属性。 这在 json 中明确格式化,因此我们可以这样解释并从那里获取我们需要的所有信息..

import requests
import json
from bs4 import BeautifulSoup
import csv

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
activity_data = divdata.get("data-react-props")
activity_dict = json.loads(activity_data)

print("My rides elevation was:", activity_dict['activity']['elevation'])

编辑:@It_is_Chris 建议改用 Strava API, https://developers.strava.com/docs/reference/ 这似乎是一个更好的选择。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM