繁体   English   中英

我看到了文本,但无法通过.text返回它

[英]I see the text, but cannot .text return it SOUP

运行:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.yellowpages.com/search? search_terms=bestbuy+10956&geo_location_terms=10956').text

soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all("div", {"class": "result"}):

    info_primary = article.find("div", {"class": "info-section info- 
    primary"}).text

    print(info_primary)

当黄页对商店进行评级时,产生一些嘈杂的(数字)字符。 如果存在评级,则将评级存储在“ a”标记中,否则不存在“ a”标记,并且直接进入“ p”标记。 我只想从“ p”标签中获取文本。

运行:

info_primary = article.find("div", {"class": "info-section info-primary"}).p.text

得到:

AttributeError: 'NoneType' object has no attribute 'text'

运行:

info_primary = article.find("div", {"class": "info-section info-primary"}).p

运行后,我可以看到嵌套的文本,但无法返回它。

进一步查找后,我想要的商店电话号码在“ p”标签之外。 也许通过不同的类描述正确访问“ span”标签会有所帮助吗?

想法? 谢谢!

我是Python的新手。

两件事:第一,您还必须真正find <p>标记以获取其文本。

第二,如果没有p标签,而您尝试获取其文本,则会引发AttributeError :您只需忽略该标签,然后转到下一个可能带有p .find('p')您也可以先检查.find('p')不是None;效果相同)

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text

soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all("div", {"class": "result"}):

    try:
        info_primary = article.find("div", {"class": "info-section info-primary"}).find('p').text
    except AttributeError:
        continue  # If there's no <p> (raises AttributeError) just continue to next loop iteration

    print(info_primary)

您可以看到p标记而不看到其文本的原因是该文本不在p标记内,而是在span标记内。

你可以做

    try:
        info_primary = article.find("div", {"class": "info-section info-primary"}).p.span.text
    except AttributeError:
        continue  # If there's no <p> (raises AttributeError) just continue to next loop iteration

但这只会产生第一个span的文本。 相反,要获取所有span的文本,您还可以执行以下操作:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text

soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all("div", {"class": "result"}):

    try:
        span_data = article.find("div", {"class": "info-section info-primary"}).p.find_all('span')
        info_primary = ''
        for span in span_data:
            info_primary += ' ' + span.text
    except AttributeError:
        continue  # If there's no <p> (raises AttributeError) just continue to next loop iteration

    print(info_primary)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM