简体   繁体   English

我如何使用 python 和 beautifulsoup 进行 webscape?

[英]how do i webscape using python and beautifulsoup?

Im very new to this, but I have an idea for a website and I want to give it a good go, my aim is to scrape the Asda website for prices and products, more specifically in this case whiskey.我对此很陌生,但我有一个网站的想法,我想给它一个好的 go,我的目标是在 Asda 网站上搜索价格和产品,更具体地说是威士忌。 I want to grab the name and price of all the whiskey on the Asda website and put it into a nice table on my website, however I am having problems doing so, my code so far is getting syntax error, can anyone help?我想在 Asda 网站上获取所有威士忌的名称和价格,并将其放入我网站上的一张漂亮表格中,但是这样做时遇到问题,到目前为止我的代码出现语法错误,有人可以帮忙吗? the code so far is..到目前为止的代码是..

import requests 
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://groceries.asda.com/shelf/drinks/spirits-ready-to-drink/spirits/whisky/1579926650')

res = driver.execute_script('return document.documentElement.outerHTML')

html_soup = BeautifulSoup(res, 'html.parser')
type(html_soup)

driver.quit

response = requests.get('https://groceries.asda.com/shelf/drinks/spirits-ready-to-drink/spirits/whisky/1579926650'


whiskey_container = html_soup.find('div', {'class': 'co-product-lazy-container'})

for whiskey in whiskey_container:
    name = whiskey.find('a', {'class': 'co-product__anchor'})
    price = whiskey.find('div', {'class': 'co-product__price'})

    print(name, price)

you have syntax error, you have ")" missing:你有语法错误,你有“)”丢失:

response = requests.get('https://groceries.asda.com/shelf/drinks/spirits-ready-to-drink/spirits/whisky/1579926650'

it should be:它应该是:

response = requests.get('https://groceries.asda.com/shelf/drinks/spirits-ready-to-drink/spirits/whisky/1579926650')

-- --

btw your code won't work.顺便说一句,您的代码将无法正常工作。 you have couple of logical errors.你有几个逻辑错误。 and I doubt you can scrape that page with your current code.我怀疑你可以用你当前的代码刮掉那个页面。

Try it:试试看:

# for wait time better than time.sleep()
from selenium.webdriver.support.ui import WebDriverWait 
from selenium import webdriver
import time  # or WebDriverWait
import csv # for saving data in table

# save csv file
def save_csv(dct):
    '''
    dct - dictionary with our data:
                                "cap",
                                "title",
                                "price"

    '''
    name = "file.csv"  # file name, it can choice what you want
    print("[INFO] saving...") # for see that function works
    with open(name, 'a', encoding="utf-8") as f: # open file for writing "a"
            # this need for writing data to table 
            writer = csv.writer(f) 
            writer.writerow((dct['cap'],
                            dct['title'],
                            dct['price'],
                        ))

def scroll(driver):
    # for open all interesting us data 
    for i in range(1,6):
        # driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        driver.execute_script("window.scrollTo(0,  1000)") 
        time.sleep(7)



driver = webdriver.Firefox()
driver.get("https://groceries.asda.com/shelf/drinks/spirits-ready-to-drink/spirits/whisky/1579926650?facets=shelf%3A1579926650%3A0000&nutrition=&sortBy=&page=0")

for i in range(2): # 2 because we have only two page with data
    element = WebDriverWait(driver, 30) # or time.sleep(30)
    scroll(driver) # for open all interesting us data

    # get all data to one list in beautifulsoup type
    data = driver.find_elements_by_css_selector(".co-lazy-product-container .co-item")

    # iterating interesting data and create dictionary with data 
    for d in data:
        items = {}
        body = d.text.split("\n")
        items["cap"] = body[0]
        items["title"] = body[1]
        items["price"] = body[-2]
        save_csv(items)

    # pagination
    driver.find_element_by_css_selector(".co-pagination__last-page").click()

# close driver
driver.quit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - Webscape 隐藏字符显示在 len 中,如何删除这些字符? - Python - Webscape hidden chars show in len how do i remove these? 如何使用BeautifulSoup在python中用字符串替换HTML内容? - How do I replace HTML content with a string in python using BeautifulSoup? 如何使用 Python 找到 BeautifulSoup 的下一个标签/元素? - How do I find BeautifulSoup next tag/element using Python? 使用 Python 和 Beautifulsoup 如何在 div 中使用 select 所需的表? - Using Python and Beautifulsoup how do I select the desired table in a div? 如何使用 Beautifulsoup 删除 python 上不需要的词? - How do I remove an unwanted word on python using Beautifulsoup? Python-如何使用BeautifulSoup定位另一个类中的一个类? - Python - how do I target a class in another class using BeautifulSoup? 如何在 python 中使用 BeautifulSoup 按顺序提取字符串(例如:9、4、1、6、3…) - How do I extract string in order (example: 9, 4, 1, 6, 3…) using BeautifulSoup in python 如何使用 BeautifulSoup 插入属性? - How do I insert an attribute using BeautifulSoup? 如何使用 BeautifulSoup 找到标签? - How do I find a tag using BeautifulSoup? 如何使用 Python 和 BeautifulSoup 进行抓取 - 使用 Javascript 处理表格 - How do I scrape using Python and BeautifulSoup - Dealing with a Table using Javascript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM