[英]how to detect changes on website? python web scraping
I started doing a monitor python for a shoe website.我开始为鞋类网站做一个显示器 python。 Now I would like to know if there is a way to know when the site is updated.
现在我想知道是否有办法知道网站何时更新。 For example: if there is a change in the available shoe sizes -> Send webhook to my discord
例如:如果可用鞋码发生变化 -> 将 webhook 发送到我的 discord
I don't know how to detect changes on the site.. please help me.我不知道如何检测网站上的更改..请帮助我。 If you have an idea, let me to know:)
如果您有想法,请告诉我:)
img webhook discord img webhook discord
from dhooks import Webhook, Embed
import requests
import bs4
from bs4 import BeautifulSoup
import lxml
url = "https://en.aw-lab.com/women/shoes/new-arrivals-AW_10008AAQB.html?cgid=women_shoes_newin&dwvar_AW__10008AAQB_color=5011614"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
res = requests.get(url, headers=headers)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")
img_shoes = "https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwd9415a8e/images/large/5011614_0.jpg?sw=843"
size = soup.select(".b-size-selector__item-0")
array_size = []
url_shoes = "[ADIDAS SUPERSTAR BOLD](" + url + ")"
embed = Embed(
description=url_shoes,
color=0x5CDBF0,
timestamp='now'
)
for sizes in size:
get_sizes = sizes.getText()
array_size.append(get_sizes.strip())
embed.add_field(name="Size", value=('\n'.join(map(str, array_size))))
embed.set_thumbnail(img_shoes)
hook.send(embed=embed)
You can use the hashlib module to compute a checksum of the page, save it and then compute it again to check if it changed.您可以使用 hashlib 模块计算页面的校验和,保存它然后再次计算它以检查它是否更改。 NOTE: any subtle change will change the checksum!
注意:任何细微的变化都会改变校验和!
import hashlib
# ...
checksum = hashlib.sha256(res.text.encode('utf-8')).hexdigest()
# save it to a txt file as a comparison for the next accesses
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.