使用 node.js 和 jquery 抓取

Question

I'm trying to follow this tutorial on scraping with node and jquery -我正在尝试按照本教程使用 node 和 jquery 进行抓取 -

http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/ http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/

Within that they have some code that reads like this -其中他们有一些这样的代码 -

request({uri:"http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss="},function(err,response,body){

jsdom.env({
html: "http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss=",
src:['http://code.jquery.com/jquery-1.6.min.js'],
done: function(errors,window){
    console.log("WINDOW");
    console.log(window.jQuery);
    var $ = window.$;
    //other stuff

When I console log window.Jquery, or window.$, both are undefined - but shouldn't they be because jsdom should embed jquery into the page?当我控制台日志 window.Jquery 或 window.$ 时，两者都是未定义的 - 但它们不应该是因为 jsdom 应该将 jquery 嵌入到页面中吗？ Why is that not happening?为什么没有发生？

Answer 1

The problem is that you initialized it with "src" parameter which should contain the array of actual source code of the javascript files (jquery in this case) - not the url to the file.问题是您使用“src”参数对其进行了初始化，该参数应包含 javascript 文件（在本例中为 jquery）的实际源代码数组 - 而不是文件的 url。

If you want the url you need to initialize it like this:如果你想要这个 url，你需要像这样初始化它：

jsdom.env(
  "http://nodejs.org/dist/",
  ["http://code.jquery.com/jquery.js"],
  function (errors, window) {

or like this:或者像这样：

jsdom.env({
  html: "http://news.ycombinator.com/",
  scripts: ["http://code.jquery.com/jquery.js"],
  done: function (errors, window) {

edit: There is another mistake in your code (if I'm not mistaken...) - you first download the page with the request module, but then instead of passing the html source to jsdom (by passing it the body you got from request ) you tell jsdom to download the page again.编辑：还有一个错误在你的代码（如果我没有记错的话...） -你先用请求模块下载页面，但随后而不是经过HTML源jsdom（通过传递它的body ，你从了request ) 你告诉 jsdom 再次下载页面。 If you give jsdom the url of the page as html then you don't need to call the request module.如果你给 jsdom 页面的 url 作为 html，那么你不需要调用请求模块。

使用 node.js 和 jquery 抓取

问题描述

1 个解决方案

解决方案1
0 2013-01-24 13:54:45

使用 node.js 和 jquery 抓取

问题描述

1 个解决方案

解决方案1 0 2013-01-24 13:54:45

解决方案1
0 2013-01-24 13:54:45