[英]I see the text, but cannot .text return it SOUP
运行:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search? search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
info_primary = article.find("div", {"class": "info-section info-
primary"}).text
print(info_primary)
当黄页对商店进行评级时,产生一些嘈杂的(数字)字符。 如果存在评级,则将评级存储在“ a”标记中,否则不存在“ a”标记,并且直接进入“ p”标记。 我只想从“ p”标签中获取文本。
运行:
info_primary = article.find("div", {"class": "info-section info-primary"}).p.text
得到:
AttributeError: 'NoneType' object has no attribute 'text'
运行:
info_primary = article.find("div", {"class": "info-section info-primary"}).p
运行后,我可以看到嵌套的文本,但无法返回它。
进一步查找后,我想要的商店电话号码在“ p”标签之外。 也许通过不同的类描述正确访问“ span”标签会有所帮助吗?
想法? 谢谢!
我是Python的新手。
两件事:第一,您还必须真正find
<p>
标记以获取其文本。
第二,如果没有p
标签,而您尝试获取其文本,则会引发AttributeError
:您只需忽略该标签,然后转到下一个可能带有p
.find('p')
您也可以先检查.find('p')
不是None;效果相同)
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
try:
info_primary = article.find("div", {"class": "info-section info-primary"}).find('p').text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
print(info_primary)
您可以看到p
标记而不看到其文本的原因是该文本不在p
标记内,而是在span
标记内。
你可以做
try:
info_primary = article.find("div", {"class": "info-section info-primary"}).p.span.text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
但这只会产生第一个span
的文本。 相反,要获取所有span
的文本,您还可以执行以下操作:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
try:
span_data = article.find("div", {"class": "info-section info-primary"}).p.find_all('span')
info_primary = ''
for span in span_data:
info_primary += ' ' + span.text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
print(info_primary)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.