[英]How to compare variables if not http 200 status
I have currently written a webscraping where I compare two values to see if there has been any increased value from previous request compare to new request.我目前已经写了一个网页抓取,我比较了两个值,看看与新请求相比,以前的请求是否有任何增加的值。
import json
import re
import time
from dataclasses import dataclass
from typing import Optional, List
import requests
from bs4 import BeautifulSoup
@dataclass
class Product:
name: Optional[str]
price: Optional[str]
image: Optional[str]
sizes: List[str]
@staticmethod
def get_sizes(doc: BeautifulSoup) -> List[str]:
pat = re.compile(
r'^<script>var JetshopData='
r'(\{.*\})'
r';</script>$',
)
for script in doc.find_all('script'):
match = pat.match(str(script))
if match is not None:
break
else:
return []
data = json.loads(match[1])
return [
variation
for get_value in data['ProductInfo']['Attributes']['Variations']
if get_value.get('IsBuyable')
for variation in get_value['Variation']
]
@classmethod
def from_page(cls, url: str) -> Optional['Product']:
with requests.get(url) as response:
response.raise_for_status()
doc = BeautifulSoup(response.text, 'html.parser')
name = doc.select_one('h1.product-page-header')
price = doc.select_one('span.price')
image = doc.select_one('meta[property="og:image"]')
return cls(
name=name and name.text.strip(),
price=price and price.text.strip(),
image=image and image['content'],
sizes=cls.get_sizes(doc),
)
def main():
product = Product.from_page("https://shelta.se/sneakers/nike-air-zoom-type-whiteblack-cj2033-103")
previous_request = product.sizes
while True:
product = Product.from_page("https://shelta.se/sneakers/nike-air-zoom-type-whiteblack-cj2033-103")
if set(product.sizes) - set(previous_request):
print("new changes on the webpage")
previous_request = product.sizes
else:
print("No changes made")
time.sleep(500)
if __name__ == '__main__':
main()
The problem I am facing is that there is a scenario where the product can be taken down.我面临的问题是,有一种产品可以被下架的场景。 For example if I now have found sizes
['US 9,5/EUR 43', 'US 10,5/EUR 44,5']
and the webpage gets taken down by the admin where it returns 404. After few hours they re-add back the webpage and add again the values ['US 9,5/EUR 43', 'US 10,5/EUR 44,5']
- That would not print the value we already had it before on our previous valid request.例如,如果我现在找到了尺寸
['US 9,5/EUR 43', 'US 10,5/EUR 44,5']
并且网页被管理员删除并返回 404。几个小时后他们重新- 添加回网页并再次添加值['US 9,5/EUR 43', 'US 10,5/EUR 44,5']
- 这不会打印我们之前在之前的有效请求中已经拥有的值.
I wonder what would be the best way to print out the values if a webpage returns from 404 back to 200 (even if they add the same value?)我想知道如果网页从 404 返回到 200(即使它们添加相同的值?)
The use of response.raise_for_status()
is incorrect in this case.在这种情况下,
response.raise_for_status()
的使用是不正确的。 That will simply raise an exception if the website returns a 404, 500 or similar, exiting your program.如果网站返回 404、500 或类似信息,退出程序,则只会引发异常。 change out
response.raise_for_status()
with:更改
response.raise_for_status()
为:
if response.status_code is not 200:
return cls(None,None,None,None)
EDIT as i misinterpreted the question:编辑,因为我误解了这个问题:
An empty product will now be returned if an error occurred.如果发生错误,现在将返回空产品。 The only check required now is if the sizes has changed.
现在唯一需要检查的是尺寸是否发生了变化。
def main():
url = "https://shelta.se/sneakers/nike-air-zoom-type-whiteblack-cj2033-103"
previous_product = Product.from_page(url)
while True:
product = Product.from_page(url)
if not product.sizes == previous_product.sizes:
print("new changes on the webpage")
else:
print("No changes made")
previous_product = product
time.sleep(500)
previous_product
has been moved outside. previous_product
已移到外面。 In this exact case, it does not matter, but it improves readability.在这种确切的情况下,这无关紧要,但它提高了可读性。
The use of set(...) - set(...)
has been removed as it does not catch when something has been removed from the website, only when something is added. set(...) - set(...)
已被删除,因为它不会在某些内容从网站上删除时捕获,只有在添加某些内容时才捕获。 If something is first removed and then re-added, it would be have been caught by your program either.如果先删除然后重新添加某些内容,它也会被您的程序捕获。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.