简体   繁体   中英

How to get page content using Javascript or JQuery

I will have a widget on a remote page. In the widget I want javascript or jquery to get all the article content from the webpage and send it back to my website. I only need just the article content and not all the other information on the webpage. I would like the script to send the remote webpage url, page content, title text, and h1 text. I would not like to receive any html tags. Is this possible to do?

The script I am making is like google adsense. Also, Ill be using c# as my backend server

will something like this work? http://blog.nparashuram.com/2009/08/screen-scraping-with-javascript-firebug.html

my suggestion, if it's not too much data would be to use a beacon.

var beac = new Image();
beac.onload = function () {
  //do somethiringng on completion
}
beac.src = "youdomain/somthing.php?var=asdasd&key=someUniqueString";

This allows you to send a moderate amount of data to a server on another domain, provided you don't need anything back.

In short you can't do this, at least not in the way you were expecting. For security reasons there's a same-origin policy in place that prevents you from making requests to another domain.

Your best option is to do this on your server and make the request to it. I can't speak as to how you'd do this on the server since your question doesn't include which framework you're on, but let's say it's PHP, then you'd have that page take a URL, or something you can generate the URL from, then return a JSON object containing the properties you listed. The jQuery part would look something like this:

$("a").click(function() {
  $.ajax({
    url: 'myPage.php',
    data: { url: $(this).attr("href") },
    dataType: 'json',
    success: function(data) {
      //use the properties, data.url, data.content, data.title, etc...
    }
  });
});

Or, the short form using $.getJSON() ...

  $.getJSON('myPage.php', { url: $(this).attr("href") }, function(data) {
      //use the properties, data.url, data.content, data.title, etc...
  });

All the above not withstanding, you're better off sending the URL to your server and doing this completely server-side, it'll be less work. If you're aiming to view the client's page as they would see it...well this is exactly what the same-origin policy is in place to prevent, eg what if instead of an article it was their online banking? You can see why this is prohibited :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM