简体   繁体   English

使用 node.js 和 jquery 抓取

[英]Scraping with node.js and jquery

I'm trying to follow this tutorial on scraping with node and jquery -我正在尝试按照本教程使用 node 和 jquery 进行抓取 -

http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/ http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/

Within that they have some code that reads like this -其中他们有一些这样的代码 -

request({uri:"http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss="},function(err,response,body){

jsdom.env({
html: "http://events.sfgate.com/search?swhat=&swhen=&swhere=San+Francisco&commit=Search&st_select=any&search=true&svt=text&srss=",
src:['http://code.jquery.com/jquery-1.6.min.js'],
done: function(errors,window){
    console.log("WINDOW");
    console.log(window.jQuery);
    var $ = window.$;
    //other stuff

When I console log window.Jquery, or window.$, both are undefined - but shouldn't they be because jsdom should embed jquery into the page?当我控制台日志 window.Jquery 或 window.$ 时,两者都是未定义的 - 但它们不应该是因为 jsdom 应该将 jquery 嵌入到页面中吗? Why is that not happening?为什么没有发生?

The problem is that you initialized it with "src" parameter which should contain the array of actual source code of the javascript files (jquery in this case) - not the url to the file.问题是您使用“src”参数对其进行了初始化,该参数应包含 javascript 文件(在本例中为 jquery)的实际源代码数组 - 而不是文件的 url。

If you want the url you need to initialize it like this:如果你想要这个 url,你需要像这样初始化它:

jsdom.env(
  "http://nodejs.org/dist/",
  ["http://code.jquery.com/jquery.js"],
  function (errors, window) {

or like this:或者像这样:

jsdom.env({
  html: "http://news.ycombinator.com/",
  scripts: ["http://code.jquery.com/jquery.js"],
  done: function (errors, window) {

edit: There is another mistake in your code (if I'm not mistaken...) - you first download the page with the request module, but then instead of passing the html source to jsdom (by passing it the body you got from request ) you tell jsdom to download the page again.编辑:还有一个错误在你的代码(如果我没有记错的话...) -你先用请求模块下载页面,但随后而不是经过HTML源jsdom(通过传递它的body ,你从了request ) 你告诉 jsdom 再次下载页面。 If you give jsdom the url of the page as html then you don't need to call the request module.如果你给 jsdom 页面的 url 作为 html,那么你不需要调用请求模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM