简体   繁体   中英

Tracking changes to web page content

I have a requirement to track changes to website contents: my users have a list of websites they would like to monitor and get alerted when the contents of these websites are updated. I know there are tools out there that does this already: firefox addons such as check 4 change, update scanner, etc. But, I need to do this from my application and report any updates to the websites being monitored from within my application to my users. How can I do this using javascript or java?

What you probably want is a web crawler that runs a quick diff or hash on each page to check for changes. Here's a question about Java web crawler libraries: https://stackoverflow.com/questions/2495289/what-is-a-good-java-web-crawler-library

As for hashing, MD5 is pretty fast and is guaranteed to give different hashes for different content (even if it's just a little different). CRC is even quicker, but isn't as reliable.

If none of that works for you, hopefully searching for "{Java|Javascript} web crawler" will give you some ideas.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM