简体   繁体   English

如何从跨度标签和跨度 BeautifulSoup 中的 class 获取文本

[英]How to get text from span tag and span class in BeautifulSoup

I'm trying to scrape some information from a website that has the following html:我正在尝试从具有以下 html 的网站上抓取一些信息:

<div role="tabpanel">
   <ul class="css-1ijyj3z e1iszlzh2" data-testid="lblPDPInfoProduk">
      <li class="css-354z6m">
         <span>
            Kondisi<!-- -->: 
         </span>
         <span class="main">Baru</span>
      </li>
      <li class="css-354z6m">
         <span>
            Berat<!-- -->: 
         </span>
         <span class="main">500 Gram</span>
      </li>
      <li class="css-354z6m">
         <span>
            Kategori<!-- -->: 
         </span>
         <a href="https://www.tokopedia.com/p/handphone-tablet/handphone/android-os" rel="noopener noreferrer" target="_blank"><b>Android OS</b></a>
      </li>
      <li class="css-354z6m">
         <span>
            Etalase<!-- -->: 
         </span>
         <a href="https://www.tokopedia.com/ofan-store8/etalase/xiaomi" rel="noopener noreferrer" target="_blank"><b>Xiaomi</b></a>
      </li>
   </ul>
   <div class="css-1dwge1q">
      <span class="css-11oczh8 e1iszlzh0">
         <span class="css-17zm3l e1iszlzh1">
            <div data-testid="lblPDPDescriptionProduk">Produk segel<br/>Kualitas terjamin keasliannya <br/>bergaransi TAM<br/>Produk kami kirim dlm keadaan ssgel...<br/><br/>Note : <br/>UNTUK PARA PEMBELI MOHON DI BACA SEBELUM MEMBELI..... <br/><br/>untuk garansi Kami akan aktivasi sesuai dengan invoice pembelian di  TOKOPEDIA oleh Promotor Xiaomi kami dengan cara sebagai berikut : <br/><br/>imei handphone di sold out menggunakan sistem aplikasi yg ada di setiap handphone para promotor xiaomi... dan kami pastikan produk tdk lah di unboxing tp msh tetap dlm keadaan segel. mohon maaf kami tdk akan melayani komplain apabila aktivasi garansi sudah sesuai dengan invoice pembelian di tokopedia.  untuk para pembeli dgn  melakukan pembelian maka kami anggap sudah setuju dgn peraturan toko <br/><br/>JADILAH PEMBELI YG BIJAKSANA<br/></div>
         </span>
      </span>
      <button class="css-5lrz2e" data-testid="btnPDPSeeMore" type="button">Lihat Selengkapnya</button>
   </div>
</div>

I'm trying to scrape the description of product name and store them within a list, so the output will be:我正在尝试抓取产品名称的描述并将它们存储在列表中,因此 output 将是:

Kondisi: Baru
Berat: 500 Gram
Kategori: Android OS
Etalase: Xiaomi

I tried:我试过了:

description = []
quotes = soup.find_all('div', {'role': 'tabpanel'})
for item in quotes:
    desc = item.find('span').text
    description.append(desc)

but the output only:但仅限 output:

['Kondisi: ']

How can I change this to correct code?如何将其更改为正确的代码? Thankyou!谢谢!

You grab the <div> tag.你抓住<div>标签。 When you do find() , it'll just get the first tag it finds.当你做find()时,它只会得到它找到的第一个标签。 What you actually want is to find_all() .你真正想要的是find_all() But instead of <span> tags, go after the <li> tags.但不是<span>标签,而是在<li>标签之后的 go。 Then you can iterate through those to pull out the text from each of those tags.然后您可以遍历这些以从每个标签中提取文本。

Given:鉴于:

html = '''<div role="tabpanel">
   <ul class="css-1ijyj3z e1iszlzh2" data-testid="lblPDPInfoProduk">
      <li class="css-354z6m">
         <span>
            Kondisi<!-- -->: 
         </span>
         <span class="main">Baru</span>
      </li>
      <li class="css-354z6m">
         <span>
            Berat<!-- -->: 
         </span>
         <span class="main">500 Gram</span>
      </li>
      <li class="css-354z6m">
         <span>
            Kategori<!-- -->: 
         </span>
         <a href="https://www.tokopedia.com/p/handphone-tablet/handphone/android-os" rel="noopener noreferrer" target="_blank"><b>Android OS</b></a>
      </li>
      <li class="css-354z6m">
         <span>
            Etalase<!-- -->: 
         </span>
         <a href="https://www.tokopedia.com/ofan-store8/etalase/xiaomi" rel="noopener noreferrer" target="_blank"><b>Xiaomi</b></a>
      </li>
   </ul>
   <div class="css-1dwge1q">
      <span class="css-11oczh8 e1iszlzh0">
         <span class="css-17zm3l e1iszlzh1">
            <div data-testid="lblPDPDescriptionProduk">Produk segel<br/>Kualitas terjamin keasliannya <br/>bergaransi TAM<br/>Produk kami kirim dlm keadaan ssgel...<br/><br/>Note : <br/>UNTUK PARA PEMBELI MOHON DI BACA SEBELUM MEMBELI..... <br/><br/>untuk garansi Kami akan aktivasi sesuai dengan invoice pembelian di  TOKOPEDIA oleh Promotor Xiaomi kami dengan cara sebagai berikut : <br/><br/>imei handphone di sold out menggunakan sistem aplikasi yg ada di setiap handphone para promotor xiaomi... dan kami pastikan produk tdk lah di unboxing tp msh tetap dlm keadaan segel. mohon maaf kami tdk akan melayani komplain apabila aktivasi garansi sudah sesuai dengan invoice pembelian di tokopedia.  untuk para pembeli dgn  melakukan pembelian maka kami anggap sudah setuju dgn peraturan toko <br/><br/>JADILAH PEMBELI YG BIJAKSANA<br/></div>
         </span>
      </span>
      <button class="css-5lrz2e" data-testid="btnPDPSeeMore" type="button">Lihat Selengkapnya</button>
   </div>
</div>'''

Code:代码:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

description = []
quotes = soup.find_all('div', {'role': 'tabpanel'})
for item in quotes:
    desc = item.find_all('li')
    for each in desc:
        description.append(each.text.split())

Output: Output:

print(description)
[['Kondisi:', 'Baru'], ['Berat:', '500', 'Gram'], ['Kategori:', 'Android', 'OS'], ['Etalase:', 'Xiaomi']]

you can try this:你可以试试这个:

description = {}
quotes = soup.find_all('div', {'role': 'tabpanel'})
for item in quotes:
    for a in item.find("ul").find_all('li'):
        ls = a.text.strip().split('\n')
        description[ls[0]] = ls[-1]
print(description)

output is: output 是:

{'Kondisi: ': 'Baru', 'Berat: ': '500 Gram', 'Kategori: ': 'Android OS', 'Etalase: ': 'Xiaomi'}

Try searching for the class css-354z6m and call the.get_text() method:尝试搜索 class css-354z6m并调用.get_text()方法:

soup = BeautifulSoup(html, "html.parser")
print([tag.get_text(strip=True) for tag in soup.find_all(class_="css-354z6m")])

Output: Output:

['Kondisi:Baru', 'Berat:500 Gram', 'Kategori:Android OS', 'Etalase:Xiaomi']

If you can use htql, here is the code:如果你可以使用htql,这里是代码:

import htql
for a,b in htql.query(html, "<li> &tx {a=/':'/1 &trim; b=/':'/2 &trim } "):  
  print("%s: %s" % (a,b) ) 

It prints:它打印:

Kondisi: Baru
Berat: 500 Gram
Kategori: Android OS
Etalase: Xiaomi

从中获取文本<div id="text_translate"><p>所以我试图从网站上获取特定的文本,但它只会给我错误 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 没有属性 'text')</p><p> 我特别想获得“底价”文本。</p><p> 我的代码:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div> - Get text from <span class: with Beautifulsoup and requests

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从BeautifulSoup中的span标签获取文本 - How to get text from span tag in BeautifulSoup 使用 beautifulsoup 从嵌套的 span 标签中获取文本 - Get the text from the nested span tag with beautifulsoup 如何使用beautifulsoup和python在span标签中获取文本 - How to get text in span tag with beautifulsoup and python 如何在循环中从BeautifulSoup中的span标签获取文本 - How to get text from span tag in BeautifulSoup in loop 如何使用 Beautifulsoup 获取文本 - How to use Beautifulsoup to get text from <span tag 从中获取文本<div id="text_translate"><p>所以我试图从网站上获取特定的文本,但它只会给我错误 (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}) .text AttributeError: 'NoneType' object 没有属性 'text')</p><p> 我特别想获得“底价”文本。</p><p> 我的代码:</p><pre> import bs4 from bs4 import BeautifulSoup #target url url = "https://magiceden.io/marketplace/solsamo" #act like browser headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} response = requests.get('https://magiceden.io/marketplace/solsamo') #parse the downloaded page soup = BeautifulSoup(response.content, 'lxml') floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text print(floor)</pre></div> - Get text from <span class: with Beautifulsoup and requests 如何在beautifulsoup中从span标签获取字符串 - How to get string from span tag in beautifulsoup Beautifulsoup - 如何从<span>'s</span>获取文本 - Beautifulsoup - how to get text from <span>'s 从 BeautifulSoup 中包含嵌套 span 标签的 span 标签中抓取文本 - scrape text from a span tag containing nested span tag in BeautifulSoup 无法<span>从beautifulsoup</span>获取p class = info <span>标签数据</span> - cannot get p class=info <span> tag data from beautifulsoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM