简体   繁体   English

在美丽的汤网络抓取中一无所获

[英]Getting none in beautiful soup web-scraping

My intend我的意图

I want to scrap commits of user from github using beautiful soup with python.我想使用带有 python 的漂亮汤从github 中删除用户的提交。

My issue我的问题

Getting none as result of my script.由于我的脚本none得到none结果。

My code我的代码

from bs4 import BeautifulSoup
import requests

html = requests.get('https://github.com/pnp/cli-microsoft365').text
soup = BeautifulSoup(html, 'html.parser')
commits = soup.find('strong', class_='repo-content-pjax-container > div > div.gutter-condensed.gutter-lg.flex-column.flex-md-row.d-flex > div.flex-shrink-0.col-12.col-md-9.mb-4.mb-md-0 > div.Box.mb-3 > div.Box-header.position-relative > div > div:nth-child(4) > ul > li > a > span > strong')
print(commits)

What happens?发生什么了?

Your using a "wild mix" in your find() and this will not lead to the element you are expected to find, thats why you get a None您在find()使用“狂野组合”,这不会导致您期望找到的元素,这就是为什么您会得到None

How to fix?怎么修?

Use the css selector to chain the parts you are looking for, in this case it will pick the <svg> in front of commits and its next <span> element that contains the <strong> :使用 css 选择器链接您要查找的部分,在这种情况下,它将选择提交前面的<svg>及其包含<strong>下一个<span>元素:

soup.select_one('svg.octicon.octicon-history + span strong').text 

Output (in moment of my request)输出(在我请求的那一刻)

1,664

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM