簡體   English   中英

使用 BeautifulSoup 索引菜譜時遇到問題

[英]Having trouble indexing into recipe with BeautifulSoup

我正在編寫一個程序來遍歷食譜網站 The Woks of Life,並提取每個食譜並將其存儲在 CSV 文件中。 我已設法提取鏈接以用於存儲目的,但在提取頁面上的元素時遇到問題。 網站鏈接是https://thewoksoflife.com/baked-white-pepper-chicken-wings/ 我想要達到的元素是名稱、烹飪時間、成分、卡路里、說明等。

def parse_recipe(link):
    #hardcoded link for now until i get it working
    page = requests.get("https://thewoksoflife.com/baked-white-pepper-chicken-wings/")
    soup = BeautifulSoup(page.content, 'html.parser')
    for i in soup.findAll("script", {"class": "yoast-schema-graph yoast-schema-graph--main"}):
        print(i.get("name")) #should print "Baked White Pepper Chicken Wings" but prints "None"

作為參考,當我打印(i)時,我得到:

<script class="yoast-schema-graph yoast-schema-graph--main" type="application/ld+json"> 
   {"@context":"https://schema.org","@graph": 
   [{"@type":"Organization","@id":"https://thewoksoflife.com/#organization","name":"The Woks of 
    Life","url":"https://thewoksoflife.com/","sameAs": 
   ["https://www.facebook.com/thewoksoflife","https://twitter.com/thewoksoflife"],"logo": 
{"@type":"ImageObject","@id":"https://thewoksoflife.com/#logo","url":"https://thewoksoflife.com/wp- 
content/uploads/2019/05/Temporary-Logo-e1556728319201.png","width":365,"height":364,"caption":"The 
Woks of Life"},"image":{"@id":"https://thewoksoflife.com/#logo"}}{"@type":"WebSite","@id":"https://thewoksoflife.com/#website","url":"https://thewoksoflife.com/","name": 
   "The Woks of Life","description":"a culinary genealogy","publisher": 
   {"@id":"https://thewoksoflife.com/#organization"},"potentialAction": 
   {"@type":"SearchAction","target":"https://thewoksoflife.com/?s={search_term_string}","query- 
   input":"required name=search_term_string"}}, 
   {"@type":"ImageObject","@id":"https://thewoksoflife.com/baked-white-pepper-chicken- 
   wings/#primaryimage","url":"https://thewoksoflife.com/wp-content/uploads/2019/11/white-pepper- 
   chicken-wings-9.jpg","width":600,"height":836,"caption":"Crispy Baked White Pepper Chicken Wings, 
   thewoksoflife.com"},{"@type":"WebPage","@id":"https://thewoksoflife.com/baked-white-pepper- 
   chicken-wings/#webpage","url":"https://thewoksoflife.com/baked-white-pepper-chicken- 
   wings/","inLanguage":"en-US","name":"Baked White Pepper Chicken Wings | The Woks of 
   Life", .................. #continues onwards

我試圖訪問位於上面代碼片段末尾的“名稱”(以及其他類似無法訪問的元素),但我無法這樣做。 任何幫助,將不勝感激!

數據是JSON格式,所以定位到<script>標簽后,就可以用JSON模塊解析了。 舉個例子:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://thewoksoflife.com/baked-white-pepper-chicken-wings/'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

data = json.loads( soup.select_one('script.yoast-schema-graph.yoast-schema-graph--main').text )
# print(json.dumps(data, indent=4))  # <-- uncomment this to print all data

recipe = next((g for g in data['@graph'] if g.get('@type', '') == 'Recipe'), None)
if recipe:
    print('Name        =', recipe['name'])
    print('Cook Time   =', recipe['cookTime'])
    print('Ingredients =', recipe['recipeIngredient'])
    # ... etc.

印刷:

Name        = Baked White Pepper Chicken Wings
Cook Time   = PT40M
Ingredients = ['3 pounds whole chicken wings ((about 14 wings))', '1-2 tablespoons white pepper powder ((divided))', '2 teaspoons salt ((divided))', '1 teaspoon Sichuan peppercorn powder ((optional))', '2 teaspoons vegetable oil ((plus more for brushing))', '1/2 cup all purpose flour', '1/4 cup cornstarch']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM