简体   繁体   English

我应该如何优化/构建数据库以收集网上商店的价格发展信息?

[英]How should I optimize/structure my database for collecting webshop price development?

I have a small personal project, which is supposed to collect data from different webshops. 我有一个小型个人项目,应该从不同的网上商店收集数据。 What I basically do, is run a cron script every night. 我基本上要做的是每天晚上运行cron脚本。 This script uses the Simple HTML DOM Parser for PHP to fetch prices for products in selected product groups. 该脚本使用PHP的简单HTML DOM解析器来获取选定产品组中产品的价格。

As of now, my database consists of three tables: 到目前为止,我的数据库由三个表组成:
- stores Name, URL etc for each webshop - stores每个网上stores名称,URL等
- products URL, product names etc for each product -每个产品的products网址,产品名称等
- prices Prices for every day each product - prices每个产品每天的价格

My question is the prices table. 我的问题prices表。 Every time the cron script runs, it saves new entries with price data for each product (300+), even if the products price is unchanged. 每次cron脚本运行时,即使产品价格不变,它也会保存每个产品(300+)的价格数据新条目。 I know I can prevent saving unnecessary data with a check to see wether the price is changed or not. 我知道我可以通过检查价格是否改变来防止保存不必要的数据。 But then again a product can be taken out of stock, leaving me no information of when it went out of stock (which it would if I saved the price each day). 但是话又说回来,某种产品可以从库存中取出,而没有任何关于何时缺货的信息(如果我每天节省价格的话)。 How would you guys do this more effective? 你们将如何做得更有效? The cron script would potentially take a long time to execute because of the DOM parsing, and I want to be sure everything is parsed and added to database as expected. 由于DOM解析,cron脚本可能需要很长时间才能执行,我想确保一切都按预期进行了解析并添加到了数据库中。

I guess you could keep track of each DOM you parsed, and store a checksum of it to see if it has changed when you load it again the next night. 我猜您可以跟踪每个解析的DOM,并存储它的校验和,以查看第二天​​晚上再次加载它时是否已更改。 If the checksum is the same, you'll know you need no parsing and no updating because nothing will have changed. 如果校验和相同,您将不需要解析也不需要更新,因为什么都不会改变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM