简体   繁体   中英

Extract Text from from multiple tags, such as h1 and p tags that has a class, with BeautifulSoup and Python

I have figured out how to extract text from the itemprop but I cannot extract text from the <div clas="someclass">Extract This Text Here!</div> I have pasted just the part of my code that isn't working but will paste the entire thing if I need to.

I have set up a variable with BeautifulSoup and Python to get the page but it wont grab just the text.

Edit: Some Text is wrapped in an h1 tag and some text is in a p tag with multiple spans.

Edit 2: So some of data is inside a <div class=“someclass”><h1>There's the text</h1></div> and the other is in <p class=“anotherclass”><span>This is another text</span></p> . How do I extract the text from multiple tags?

for each_business in info:
    yp_bus_url = each_business.find('a', {'class': 'business-name'}).get('href')
    whole_url = "https://www.yellowpages.com"+yp_bus_url
    print(whole_url)
    bus_page = requests.get(whole_url)
    bus_soup = BeautifulSoup(page.text, 'html.parser')
    # The variable below wont get text. I've tried different variations with it too but it doesn't work.
    business_name = bus_soup.findAll("div", class_="sales-info")
    print(business_name)

I've used the html you've given in the question to extract the text inside <p> and <div> tag. I hope this is what you are looking for

html='''<div class="someclass"><h1>There’s the text</h1></div><p class="anotherclass"><span>This is another text</span>'''
soup = BeautifulSoup(html,'lxml')
print(soup.find('div',class_='someclass').text)
print(soup.find('p',class_='anotherclass').text)

Output
There's the text
This is another text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM