简体   繁体   English

以编程方式监控网页

[英]Programmatically monitor a webpage

Every project on drupal.org has its own page: drupal.org上的每个项目都有自己的页面:

http://drupal.org/project/marinelli http://drupal.org/project/marinelli

When a new release is made, it gets added to that project's release page 在发布新版本时,它将添加到该项目的发布页面

http://drupal.org/node/185969/release http://drupal.org/node/185969/release

I'm trying to monitor when the page, but of course I don't want to keep checking on it manually. 我正在尝试监控页面的时间,但当然我不想继续手动检查它。 I need to do it programmatically with php. 我需要用php编程。

  • Do I have to scrape the page? 我必须刮页吗? Is this page scrapable? 这个页面是否可以报废?

  • I see an RSS feed, but not sure how that works or if it can help me with monitoring or how. 我看到一个RSS提要,但不确定它是如何工作的,或者它是否可以帮助我监控或如何。

  • Does drupal.org offer a cleaner solution like an API? drupal.org是否提供像API一样的清洁解决方案? or is there a way to monitor the repository directly? 或者有没有办法直接监控存储库?

  • Other solutions welcome 欢迎其他方案

There is a core module "Update Status" that checks if there are any updates available for your installed modules. 有一个核心模块“更新状态”,用于检查是否有可用于已安装模块的更新。 You can either use that directly, if that fits your needs, or check the source how the module requests the data. 如果符合您的需要,您可以直接使用它,也可以检查源模块如何请求数据。

Instead of trying to scrappe the page, like you said, a better solution could be to use its RSS feed -- for example, in your case : http://drupal.org/node/185969/release/feed 正如你所说,不是试图刮擦页面,而是更好的解决方案是使用它的RSS提要 - 例如,在你的情况下: http//drupal.org/node/185969/release/feed

The advantage is that RSS is a well-defined format : there are less chances of getting any un-necessary information in an HTML soup. 优点是RSS是一种定义明确的格式:在HTML汤中获取任何不必要信息的可能性较小。


To extract data from that XML feed, you can use SimpleXML to work with the XML data "by-hand", or some library like SimplePie that knows RSS/ATOM. 要从该XML提要中提取数据,您可以使用SimpleXML“手动”处理XML数据,或使用知道RSS / ATOM的SimplePie等库。

Then, in you case, you have to keep track of the last update -- and each time you fetch the RSS feed, check if there is an update that's more recent than the last one you saw the previous time. 然后,在这种情况下,您必须跟踪上次更新 - 每次获取RSS源时,请检查是否有比您上一次看到的更新更新的更新。


In the XML for your Marinelli module, you'll see that each entry contains a <pubDate> tag, that corresponds to its date ; 在您的Marinelli模块的XML中,您将看到每个条目都包含<pubDate>标记,该标记对应于其日期; for example : 例如 :

<pubDate>Tue, 25 Aug 2009 07:28:26 +0000</pubDate>

If today the most recent entry is from 2009-08-25, and, tomorrow, there is an entry from 2010-07-27... Well, it means the module has been updated ;-) 如果今天最近的参赛作品是2009-08-25,而且明天有一个参赛作品从2010-07-27 ...嗯,这意味着该模块已经更新;-)

What about the site's own feeds? 该网站自己的Feed怎么样? http://drupal.org/node/185969/release/feed Simply subscribe for it in any RSS reader (Google Reader for example) http://drupal.org/node/185969/release/feed只需在任何RSS阅读器(例如Google阅读器)中订阅它

What do you mean you need to check it programmatically? 你是什​​么意思,你需要以编程方式检查它? Is there a backend that download and installs the updates without user interaction? 是否有后端下载并安装更新而无需用户交互?

您可以在http://updates.drupal.org/release-history/$project_name/$api_version获取项目的版本,例如参见http://updates.drupal.org/release-history/marinelli/6。 X

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM