[英]Parse span with Beautiful Soup : 'NoneType' object has no attribute 'text'
I'm trying to get all relationship names on a Linkedins's web page (the example: https://www.linkedin.com/in/diversiti/detail/skills/(ACoAACfEjjEBNLPrc1Y8OKosqroRRScfwaCdrxI,5)/ )我正在尝试获取 Linkedins 的 web 页面上的所有关系名称(示例: https://www.linkedin.com/in/diversiti/detail/skills/(ACoAACfEjRSNLPrc1Y8OKosqroR)/
(Please note the ')' char before the '5'). (请注意 '5' 之前的 ')' 字符)。
Here is a part of the html code:这是 html 代码的一部分:
<div class="pv-endorsement-entity__detail pl3">
<div class="pv-endorsement-entity__name t-16 t-black t-bold truncated-text">
<span class="pv-endorsement-entity__name--has-hover">Vignesh G</span>
<span data-test-distance-badge="" id="ember122"
class="distance-badge t-black--light t-14 separator t-black--light ember-view"><span
class="visually-hidden">
out of network
</span>
<span class="dist-value" aria-hidden="true">3rd+</span>
</span>
</div>
<div class="pv-endorsement-entity__headline t-14 t-black--light t-normal">
Inventor | Engineer | MBA
</div>
</div>
I want to get the name, so in this case "Vignesh G".我想得到这个名字,所以在这种情况下是“Vignesh G”。
Here is my python code:这是我的 python 代码:
from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.linkedin.com/in/diversiti/detail/skills/(ACoAACfEjjEBNLPrc1Y8OKosqroRRScfwaCdrxI,5)/')
soup = BeautifulSoup(response.content, 'html.parser')
content = soup.find('span', {'class': 'pv-endorsement-entity__name--has-hover'}).text
print(content)
Unfortunately I got this error:不幸的是,我收到了这个错误:
'NoneType' object has no attribute 'text'
I suppose that the span object is empty for BeautifulSoup, but how to get the text in this object?我想 BeautifulSoup 的跨度 object 是空的,但是如何获取这个 object 中的文本?
LinkedIn is loading the content later. LinkedIn 稍后会加载内容。 The initial content does not contain
body
tag.初始内容不包含
body
标签。 You should use selenium
to simulate a browser.您应该使用
selenium
来模拟浏览器。
https://pypi.org/project/selenium/ https://pypi.org/project/selenium/
That way, you can load the URL and wait for the URL to load content completely.这样,您可以加载 URL 并等待 URL 完全加载内容。 It comes with utility functions such as
find_element_by_tagname
etc, which will work fine as a replacement for BeautifulSoup
approach that you are currently taking.它带有诸如
find_element_by_tagname
等实用功能,可以很好地替代您当前采用的BeautifulSoup
方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.