簡體   English   中英

使用 BeautifulSoup (Python) 進行 HTML 抓取

[英]Stuck with HTML scraping using BeautifulSoup (Python)

["

       </li>
      </ul>
     </div>
    </nav>
   </header>
   <div data-react-class="ActivityPublic" data-react-props='{
  "activity": {
    "name": "Morning Ride",
    "date": "Today",
    "athlete": {
      "name": "James Whyard",
      "avatarUrl": "https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c",
      "location": "",
      "followersCount": 3,
      "followAthleteUrl": "http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show",
      "totalDistance": "452",
      "distanceUnit": "miles",
      "totalActivities": 40
    },
    "type": "Ride",
    "detailedType": "Ride",
    "kudosCount": 0,
    "comments": [],
    "commentCount": 0,
    "achievementsCount": 11,
    "distance": "11.7 mi",
    "time": "49:38",
    "elevation": "246 ft",
    "calories": 526.0,
    "streams": {
      "altitude": [6.6, 6.6, 6.6, 6.7, 6.7, 6.7, 6.7, 6.7, 6.7, 6.9, 6.7, 6.6, 6.5, 6.4, 6.4, 6.4, 6.4, 6.2, 5.9, 6.0, 5.9, 5.8, 5.7, 5.6, 5.6, 5.6, 5.7, 5.9, 6.0, 6.0, 5.9, 5.9, 5.9, 6.0, 6.0, 6.0, 6.0, 6.0, 6.1, 6.2, 6.2, 6.4, 6.5, 6.5, 6.6, 6.9, 7.2, 7.2, 7.4
["

import requests
from bs4 import BeautifulSoup

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = divdata.get('data-react-props')
print(strdata)

你非常接近這個!

我要做的就是像您一樣抓取 div 元素,然后獲取包含您要查找的所有數據的data-react-props屬性。 這在 json 中明確格式化,因此我們可以這樣解釋並從那里獲取我們需要的所有信息..

import requests
import json
from bs4 import BeautifulSoup
import csv

url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)

soup = BeautifulSoup(urlr.content, 'html.parser')

divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
activity_data = divdata.get("data-react-props")
activity_dict = json.loads(activity_data)

print("My rides elevation was:", activity_dict['activity']['elevation'])

編輯:@It_is_Chris 建議改用 Strava API, https://developers.strava.com/docs/reference/ 這似乎是一個更好的選擇。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM