[英]Stuck with HTML scraping using BeautifulSoup (Python)
</li>
</ul>
</div>
</nav>
</header>
<div data-react-class="ActivityPublic" data-react-props='{
"activity": {
"name": "Morning Ride",
"date": "Today",
"athlete": {
"name": "James Whyard",
"avatarUrl": "https://lh3.googleusercontent.com/a-/AOh14GiA8yxgfozOqSJEiwW9srS-VEZU_mV_UM2iHFZxjw=s96-c",
"location": "",
"followersCount": 3,
"followAthleteUrl": "http://www.strava.com/register?activity_action=athlete\u0026activity_id=7487240518\u0026athlete_id=90220142\u0026content=90220142\u0026cta=follow\u0026element=button\u0026follow_athlete_after_login=true\u0026follow_athlete_after_registration=true\u0026follow_athlete_id=90220142\u0026source=activities_show",
"totalDistance": "452",
"distanceUnit": "miles",
"totalActivities": 40
},
"type": "Ride",
"detailedType": "Ride",
"kudosCount": 0,
"comments": [],
"commentCount": 0,
"achievementsCount": 11,
"distance": "11.7 mi",
"time": "49:38",
"elevation": "246 ft",
"calories": 526.0,
"streams": {
"altitude": [6.6, 6.6, 6.6, 6.7, 6.7, 6.7, 6.7, 6.7, 6.7, 6.9, 6.7, 6.6, 6.5, 6.4, 6.4, 6.4, 6.4, 6.2, 5.9, 6.0, 5.9, 5.8, 5.7, 5.6, 5.6, 5.6, 5.7, 5.9, 6.0, 6.0, 5.9, 5.9, 5.9, 6.0, 6.0, 6.0, 6.0, 6.0, 6.1, 6.2, 6.2, 6.4, 6.5, 6.5, 6.6, 6.9, 7.2, 7.2, 7.4
import requests
from bs4 import BeautifulSoup
url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)
soup = BeautifulSoup(urlr.content, 'html.parser')
divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
strdata = divdata.get('data-react-props')
print(strdata)
你非常接近这个!
我要做的就是像您一样抓取 div 元素,然后获取包含您要查找的所有数据的data-react-props
属性。 这在 json 中明确格式化,因此我们可以这样解释并从那里获取我们需要的所有信息..
import requests
import json
from bs4 import BeautifulSoup
import csv
url = 'https://www.strava.com/activities/7487240518'
urlr = requests.get(url)
soup = BeautifulSoup(urlr.content, 'html.parser')
divdata = soup.find('div', {'data-react-class':'ActivityPublic'})
activity_data = divdata.get("data-react-props")
activity_dict = json.loads(activity_data)
print("My rides elevation was:", activity_dict['activity']['elevation'])
编辑:@It_is_Chris 建议改用 Strava API, https://developers.strava.com/docs/reference/ 。 这似乎是一个更好的选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.