如何使用 BeautifulSoup 和請求從網站獲取數據？

Question

我是 web 抓取的初學者，我需要幫助解決這個問題。 網站 allrecipes.com 是一個網站，您可以在其中根據搜索找到食譜，在本例中為“餡餅”：

鏈接到 html 文件：'查看源： https://www.allrecipes.com/search/results/?wt=pie&sort=re '（右鍵單擊->查看頁面源）

我想創建一個程序，它接受輸入，在所有食譜上搜索它，並返回一個包含前五個食譜的元組的列表，其中包含制作時間、上菜產量、配料等數據。 到目前為止，這是我的程序：

import requests
from bs4 import BeautifulSoup

def searchdata():
    inp=input('what recipe would you like to search')
    url ='http://www.allrecipes.com/search/results/?wt='+str(inp)+'&sort=re'
    r=requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    links=[]

    #fill in code for finding top 3 or five links


    for i in range(3)
        a = requests.get(links[i])
        soupa = BeautifulSoup(a.text, 'html.parser')

        #fill in code to find name, ingrediants, time, and serving size with data from soupa



        names=[]
        time=[]
        servings=[]
        ratings=[]
        ingrediants=[]





searchdata()

是的，我知道，我的代碼很亂但是我應該在兩個代碼填寫區域填寫什么？ 謝謝

Answer 1

搜索食譜后，您必須獲取每個食譜的鏈接，然后再次請求每個鏈接，因為您要查找的信息在搜索頁面上不可用。 如果沒有 OOP，那看起來並不干凈，所以這是我寫的 class，它可以滿足您的需求。

import requests
from time import sleep
from bs4 import BeautifulSoup


class Scraper:
    links = []
    names = []

    def get_url(self, url):
        url = requests.get(url)
        self.soup = BeautifulSoup(url.content, 'html.parser')

    def print_info(self, name):
        self.get_url(f'https://www.allrecipes.com/search/results/?wt={name}&sort=re')
        if self.soup.find('span', class_='subtext').text.strip()[0] == '0':
            print(f'No recipes found for {name}')
            return
        results = self.soup.find('section', id='fixedGridSection')
        articles = results.find_all('article')
        texts = []
        for article in articles:
            txt = article.find('h3', class_='fixed-recipe-card__h3')
            if txt:
                if len(texts) < 5:
                    texts.append(txt)
                else:
                    break
        self.links = [txt.a['href'] for txt in texts]
        self.names = [txt.a.span.text for txt in texts]
        self.get_data()

    def get_data(self):
        for i, link in enumerate(self.links):
            self.get_url(link)
            print('-' * 4 + self.names[i] + '-' * 4)
            info_names = [div.text.strip() for div in self.soup.find_all(
                'div', class_='recipe-meta-item-header')]
            ingredient_spans = self.soup.find_all('span', class_='ingredients-item-name')
            ingredients = [span.text.strip() for span in ingredient_spans]
            for i, div in enumerate(self.soup.find_all('div', class_='recipe-meta-item-body')):
                print(info_names[i].capitalize(), div.text.strip())
            print()
            print('Ingredients'.center(len(ingredients[0]), ' '))
            print('\n'.join(ingredients))
            print()
            print('*' * 50, end='\n\n')


chrome = Scraper()
chrome.print_info(input('What recipe would you like to search: '))

如何使用 BeautifulSoup 和請求從網站獲取數據？

問題描述

1 個解決方案

解決方案1
0 2020-06-20 21:28:19

如何使用 BeautifulSoup 和請求從網站獲取數據？

問題描述

1 個解決方案

解決方案1 0 2020-06-20 21:28:19

解決方案1
0 2020-06-20 21:28:19