简体   繁体   中英

Beautiful Soup Parse Python

I've captured the following html using BS4 , but can't seem to search for the artist tag. I've assigned this block of code to a variable called container , and then tried

print container.tr.td["artist"]

without luck. Any advice appreciated?

<tr class="item">
  <!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
  <td class="date">Sat, 30 Dec 2017</td>
  <td class="artist">kool as the gang</td>
  <td class="venue">100 club</td>
  <td class="link">
  <p class="availability out-of-stock">
    <span>Off Sale</span></p>
  </td>
</tr>

Your syntax is wrong, "artist" is the value of the "class" attribute try this:

from bs4 import BeautifulSoup

html = """
<tr class="item">
<!-- <td class="image"><a href="https://www.stargreen.com/kool-as-the-gang-44415.html" title="KOOL AS THE GANG " class="product-image"><img src="https://www.stargreen.com/media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/K/o/KoolAsTheGang.jpg" width="135" height="135" alt="KOOL AS THE GANG " /></a></td> -->
<td class="date">Sat, 30 Dec 2017</td>
<td class="artist">
                        kool as the gang                     </td>
<td class="venue">100 club</td>
<td class="link">
<p class="availability out-of-stock">
<span>Off Sale</span></p>
</td>
</tr>
"""

soup = BeautifulSoup(html, 'html.parser')
td = soup.find('td',{'class': 'artist'})
print (td.text.strip())

Outputs:

kool as the gang

Another way.

Look for the element within container whose class is 'artist' with the select method. Since there could be more than one, but you know there is only one, select the only element in the list, and request its text attribute.

>>> HTML = open('sven.htm').read()
>>> import bs4
>>> container = bs4.BeautifulSoup(HTML, 'lxml')
>>> container.select('.artist')[0].text
'\n                        kool as the gang                     '

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM