简体   繁体   English

使用beautifulsoup从span类标签中提取文本

[英]Extracting text from span class tag with beautifulsoup

I am trying to extract some elements of text between a span class from a website. 我试图从网站的跨度类之间提取文本的某些元素。

Here is a snippet of the HTML code: 这是HTML代码的片段:

<h1>2 Some address</h1>
                </div>
                <div id="smi-summary-items">
                    <div id="smi-price-string">&euro;230,000</div>
                    <span class="header_text"> Detached House</span><span class="bar">&nbsp;|&nbsp;</span><span class="header_text">3 Beds</span><span class="bar">&nbsp;|&nbsp;</span><span class="header_text">2 Baths</span>
                    <!-- Text_Link_Full_Ad_Unit -->
                    <div id='dfp-text_link_full_ad_unit' class='sale'>
                        <script type='text/javascript'>
                            googletag.cmd.push(function()
                                {
                                    googletag.display('dfp-text_link_full_ad_unit');
                                }
                            );
                        </script>
                    </div>

I would like to extract the text of "3 Beds" and "2 Baths". 我想提取“ 3床”和“ 2浴”的文本。

I've tried a few solutions but mainly getting errors or an empty result. 我尝试了一些解决方案,但主要遇到错误或结果为空。

Can anyone suggest a solution? 谁能提出解决方案?

From what I understand, you can simply filter the desired elements by class: 据我了解,您可以按类简单过滤所需的元素:

[item.get_text() for item in soup.select("span.header_text")]

The complete working sample code: 完整的工作示例代码:

from bs4 import BeautifulSoup

data = """
<div id="smi-summary-items">
    <div id="smi-price-string">&euro;230,000</div>
    <span class="header_text"> Detached House</span><span class="bar">&nbsp;|&nbsp;</span><span class="header_text">3 Beds</span><span class="bar">&nbsp;|&nbsp;</span><span class="header_text">2 Baths</span>
    <!-- Text_Link_Full_Ad_Unit -->
    <div id='dfp-text_link_full_ad_unit' class='sale'>
        <script type='text/javascript'>
            googletag.cmd.push(function()
                {
                    googletag.display('dfp-text_link_full_ad_unit');
                }
            );
        </script>
    </div>"""
soup = BeautifulSoup(data, "html.parser")
print([item.get_text(strip=True) for item in soup.select("span.header_text")])

That produces: 产生:

['Detached House', '3 Beds', '2 Baths']

The following code works for extracting some elements of text between a span class from a website 以下代码用于从网站中提取跨度类之间的某些文本元素

>>> from bs4 import BeautifulSoup
>>> import re
>>> content = """<h1>2 Some address</h1>
...                 </div>
...                 <div id="smi-summary-items">
...                     <div id="smi-price-string">&euro;230,000</div>
...                     <span class="header_text"> Detached House</span>
<span class="bar">&nbsp;|&nbsp;</span><span class="header_text">3 
Beds</span><span class="bar">&nbsp;|&nbsp;</span><span class="header_text">2 
Baths</span>
...                     <!-- Text_Link_Full_Ad_Unit -->
...                     <div id='dfp-text_link_full_ad_unit' class='sale'>
...                         <script type='text/javascript'>
...                             googletag.cmd.push(function()
...                                 {
...                                     googletag.display('dfp-
text_link_full_ad_unit');
...                                 }
...                             );
...                         </script>
...                     </div>"""

>>> soup = BeautifulSoup(content, "html.parser")
>>> req = soup.find_all("span", {"class":"header_text"})
>>> print(req)
[<span class="header_text"> Detached House</span>, <span 
class="header_text">3 Beds</span>, <span class="header_text">2 Baths</span>]
>>> x23 = []
>>> for i in req:
...     x23.append(i.get_text())
...
>>> print(x23)
[' Detached House', '3 Beds', '2 Baths']

从中获取文本<div id="text_translate"><p>所以我试图从网站上获取特定的文本,但它只会给我错误 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 没有属性 'text')</p><p> 我特别想获得“底价”文本。</p><p> 我的代码:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div> - Get text from <span class: with Beautifulsoup and requests

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 中的 BeautifulSoup 从溢出的跨度标签中提取文本 - Extracting text from an overflowed span tag using BeautifulSoup in Python 如何从跨度标签和跨度 BeautifulSoup 中的 class 获取文本 - How to get text from span tag and span class in BeautifulSoup 通过特定标签python beautifulsoup在html span类中提取对象 - Extracting object in html span class by specifc tag python beautifulsoup 从中获取文本<div id="text_translate"><p>所以我试图从网站上获取特定的文本,但它只会给我错误 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 没有属性 'text')</p><p> 我特别想获得“底价”文本。</p><p> 我的代码:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div> - Get text from <span class: with Beautifulsoup and requests 使用 beautifulsoup 从 span 中提取元素 - Extracting element from span with beautifulsoup BeautifulSoup - 从不带类的多个跨度元素中提取文本 - BeautifulSoup - extracting text from multiple span elements w/o classes 使用BeautifulSoup提取标签内的文本 - Extracting text within tag with BeautifulSoup 如何使用 Beautifulsoup 获取文本 - How to use Beautifulsoup to get text from <span tag 使用相同 Span 标签的 Table 中的 Beautifulsoup 文本 - Beautifulsoup Text from Table using same Span tag 从跨度中提取文本 - Extracting text from span
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM