简体   繁体   中英

Is it possible to write a web crawler in javascript?

I want the following functions and I know about the client side domain restriction in javascript however I don't know if this restrictions applies to what I want the crawler to do.

  1. Javascript to load the text content of a given website's url address to a div or assign it to a var.

  2. the tags should then be parsed from the text.

3 the body of text should be searched for a specific word. If the word is found it should take the neighbouring sentence including the word and display it on the alert message.

I am writing a firefox application so everthing must be done on client side.

As you just said, you cannot use Javascript to retrieve arbitrary content from another domain.

However, you could write a server-side proxy in your own domain which forwards requests to arbitrary URLs and passes along the responses.

Best and easiest thing you can do is:

  • make a dynamic page on your server that accepts a param: example page.php?url=
  • your javascript will AJAX that page with the url it needs so it can retrive the HTML (thru your php script) and then parse it in js/client

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM