简体   繁体   中英

How to get text of children tag's description using beautiful soup

I am using beautiful soup to scrape some data from foodily.com

On above page there is a div with class 'ings' and I want to get data within its p tags for that I have written below code:

ingredients = soup.find('div', {"class": "ings"}).findChildren('p')

It provide me list of ingredient but with p tags.

Call get_text() for every p element found inside the div element with class="ings" .

Complete working code:

from bs4 import BeautifulSoup
import requests

with requests.Session() as session:
    session.headers.update({"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
    response = session.get("http://www.foodily.com/r/0y1ygzt3zf-perfect-vanilla-cupcakes-by-annie-s")

    soup = BeautifulSoup(response.content, "html.parser")

    ingredients = [ingredient.get_text() for ingredient in soup.select('div.ings p')]
    print(ingredients)

Prints:

[
    u'For the cupcakes:', 
    u'1 stick (113g) butter/marg*', 
    u'1 cup caster sugar', u'2 eggs', 
    ...
    u'1 tbsp vanilla extract', 
    u'2-3tbsp milk', 
    u'Sprinkles to decorate, optional'
]

Note that I've also improved your locator a bit and switched to a div.ings p CSS selector .

Another way:

import requests
from bs4 import BeautifulSoup as bs


url = "http://www.foodily.com/r/0y1ygzt3zf-perfect-vanilla-cupcakes-by-annie-s"
source = requests.get(url)
text_new = source.text
soup = bs(text_new, "html.parser")
ingredients  = soup.findAll('div', {"class": "ings"})
for a in ingredients :
    print (a.text)

It will print:

For the cupcakes:

1 stick (113g) butter/marg*

1 cup caster sugar

2 eggs

1 tbsp vanilla extract

1 and 1/2 cups plain flour

2 tsp baking powder

1/2 cup milk (I use Skim)

For the frosting:

2 sticks (226g) unsalted butter, at room temp

2 and 1/2 cups icing sugar, sifted

1 tbsp vanilla extract

2-3tbsp milk

Sprinkles to decorate, optional

If you already have the list of p tags, use get_text() . This will return only the text of them:

ingredient_list = p.get_text() for p in ingredients

The result array will look like:

ingredient_list = [
   'For the cupcakes:', '1 stick (113g) butter/marg*', 
   '1 cup caster sugar','2 eggs', ...
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM