简体   繁体   English

如何检测网站上的变化? python web 刮

[英]how to detect changes on website? python web scraping

I started doing a monitor python for a shoe website.我开始为鞋类网站做一个显示器 python。 Now I would like to know if there is a way to know when the site is updated.现在我想知道是否有办法知道网站何时更新。 For example: if there is a change in the available shoe sizes -> Send webhook to my discord例如:如果可用鞋码发生变化 -> 将 webhook 发送到我的 discord

I don't know how to detect changes on the site.. please help me.我不知道如何检测网站上的更改..请帮助我。 If you have an idea, let me to know:)如果您有想法,请告诉我:)

img webhook discord img webhook discord

from dhooks import Webhook, Embed
import requests
import bs4
from bs4 import BeautifulSoup
import lxml

url = "https://en.aw-lab.com/women/shoes/new-arrivals-AW_10008AAQB.html?cgid=women_shoes_newin&dwvar_AW__10008AAQB_color=5011614"

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")
img_shoes = "https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwd9415a8e/images/large/5011614_0.jpg?sw=843"
size = soup.select(".b-size-selector__item-0")
array_size = []

url_shoes = "[ADIDAS SUPERSTAR BOLD](" + url + ")"

embed = Embed(
    description=url_shoes,
    color=0x5CDBF0,
    timestamp='now'
)

for sizes in size:
    get_sizes = sizes.getText()
    array_size.append(get_sizes.strip())


embed.add_field(name="Size", value=('\n'.join(map(str, array_size))))

embed.set_thumbnail(img_shoes)

hook.send(embed=embed)

You can use the hashlib module to compute a checksum of the page, save it and then compute it again to check if it changed.您可以使用 hashlib 模块计算页面的校验和,保存它然后再次计算它以检查它是否更改。 NOTE: any subtle change will change the checksum!注意:任何细微的变化都会改变校验和!

import hashlib

# ...

checksum = hashlib.sha256(res.text.encode('utf-8')).hexdigest()

# save it to a txt file as a comparison for the next accesses

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM