简体   繁体   中英

cannot get p class=info <span> tag data from beautifulsoup

Cannot get the tag data, cannot get p class=info tag data from beautifulsoup thanks !

from bs4 import BeautifulSoup 
import re

html = """"
<p class="info">
<span>Kranji Mile Day simulcast races, 
Kranji Racecourse, SIN</span>
<span>Class 3 Handicap   -  1200M TURF</span>
<span>Saturday, 26 May 2018</span>
<span>Race 1, 5:15 PM</span>
</p>
"""
soup = BeautifulSoup(html, "html.parser")
table = soup.find('p', attrs={class:'info'})
rows = table.findAll("span")

print rows

expected output seperate by commas

Kranji Mile Day simulcast races, Kranji Racecourse, SIN , Class 3, Handicap, 1200M, TURF, Saturday, 26 May 2018, Race 1, 5:15PM

it is class_ because class is reserved keyword

table = soup.find('p', attrs={'class':'info'})

table = soup.find('p',class_='info'})

Use text attribute it concatenates all text inside the tag

string attribute will not working if it contains another tag inside it

print (', '.join(i.text for i in rows)) # For getting text 

Once you resolve the class problem, as explained in the other answer, you still have to extract strings from the tags:

result = ', '.join(r.string for r in rows)
print(result)
#Kranji Mile Day simulcast races, 
# Kranji Racecourse, SIN, Class 3 Handicap   -  1200M TURF, Saturday, 26 May 2018, Race 1, 5:15 PM

Hm - in python3 this works fine for me if you just quote 'class' in this line

table = soup.find('p', attrs={'class':'info'})
                          ^

though the output will be the ... elements rather than just the text. Do you want the elements or only the text?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM