简体   繁体   English

如何使用Javascript或JQuery获取页面内容

[英]How to get page content using Javascript or JQuery

I will have a widget on a remote page. 我将在远程页面上有一个小部件。 In the widget I want javascript or jquery to get all the article content from the webpage and send it back to my website. 在小部件中,我希望javascript或jquery从网页中获取所有文章内容,然后将其发送回我的网站。 I only need just the article content and not all the other information on the webpage. 我只需要文章内容,而不需要网页上的所有其他信息。 I would like the script to send the remote webpage url, page content, title text, and h1 text. 我希望脚本发送远程网页的url,页面内容,标题文本和h1文本。 I would not like to receive any html tags. 我不希望收到任何html标签。 Is this possible to do? 这可能吗?

The script I am making is like google adsense. 我正在编写的脚本就像google adsense。 Also, Ill be using c# as my backend server 另外,我将使用c#作为后端服务器

will something like this work? 这样的事情会起作用吗? http://blog.nparashuram.com/2009/08/screen-scraping-with-javascript-firebug.html http://blog.nparashuram.com/2009/08/screen-scraping-with-javascript-firebug.html

my suggestion, if it's not too much data would be to use a beacon. 我的建议是,如果数据不是太多,那就使用信标。

var beac = new Image();
beac.onload = function () {
  //do somethiringng on completion
}
beac.src = "youdomain/somthing.php?var=asdasd&key=someUniqueString";

This allows you to send a moderate amount of data to a server on another domain, provided you don't need anything back. 只要您不需要任何东西,这便可以将适量的数据发送到另一个域上的服务器。

In short you can't do this, at least not in the way you were expecting. 简而言之,您无法做到这一点,至少不能达到您的预期。 For security reasons there's a same-origin policy in place that prevents you from making requests to another domain. 出于安全原因,有一个同源策略可以阻止您向另一个域发出请求。

Your best option is to do this on your server and make the request to it. 最好的选择是在服务器上执行此操作,然后向其发出请求。 I can't speak as to how you'd do this on the server since your question doesn't include which framework you're on, but let's say it's PHP, then you'd have that page take a URL, or something you can generate the URL from, then return a JSON object containing the properties you listed. 我不能说您将如何在服务器上执行此操作,因为您的问题不包括您所使用的框架,但是假设它是PHP,那么您将使该页面带有URL或其他内容。可以从中生成URL,然后返回包含您列出的属性的JSON对象。 The jQuery part would look something like this: jQuery部分看起来像这样:

$("a").click(function() {
  $.ajax({
    url: 'myPage.php',
    data: { url: $(this).attr("href") },
    dataType: 'json',
    success: function(data) {
      //use the properties, data.url, data.content, data.title, etc...
    }
  });
});

Or, the short form using $.getJSON() ... 或者,使用$.getJSON()的简短形式...

  $.getJSON('myPage.php', { url: $(this).attr("href") }, function(data) {
      //use the properties, data.url, data.content, data.title, etc...
  });

All the above not withstanding, you're better off sending the URL to your server and doing this completely server-side, it'll be less work. 尽管以上所有这些因素,您最好将URL发送到服务器并完全在服务器端进行,这会减少工作量。 If you're aiming to view the client's page as they would see it...well this is exactly what the same-origin policy is in place to prevent, eg what if instead of an article it was their online banking? 如果您打算以他们看到的方式查看客户的页面...那么这正是防止同源政策的地方,例如,如果不是文章,而是客户的网上银行怎么办? You can see why this is prohibited :) 您可以看到为什么禁止这样做的原因:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM