简体   繁体   中英

Scraping with node.js and jquery

I'm trying to follow this tutorial on scraping with node and jquery -

http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/

Within that they have some code that reads like this -

request({uri:"http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss="},function(err,response,body){

jsdom.env({
html: "http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss=",
src:['http://code.jquery.com/jquery-1.6.min.js'],
done: function(errors,window){
    console.log("WINDOW");
    console.log(window.jQuery);
    var $ = window.$;
    //other stuff

When I console log window.Jquery, or window.$, both are undefined - but shouldn't they be because jsdom should embed jquery into the page? Why is that not happening?

The problem is that you initialized it with "src" parameter which should contain the array of actual source code of the javascript files (jquery in this case) - not the url to the file.

If you want the url you need to initialize it like this:

jsdom.env(
  "http://nodejs.org/dist/",
  ["http://code.jquery.com/jquery.js"],
  function (errors, window) {

or like this:

jsdom.env({
  html: "http://news.ycombinator.com/",
  scripts: ["http://code.jquery.com/jquery.js"],
  done: function (errors, window) {

edit: There is another mistake in your code (if I'm not mistaken...) - you first download the page with the request module, but then instead of passing the html source to jsdom (by passing it the body you got from request ) you tell jsdom to download the page again. If you give jsdom the url of the page as html then you don't need to call the request module.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM